Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

SGLang

This example shows how to deploy DeepSeek-R1-Distill-Llama 8B and 70B using SGLang :material-arrow-top-right-thin:{ .external }{:target="_blank"} and dstack.

??? info "Prerequisites" Once dstack is installed, go ahead clone the repo, and run dstack init.

<div class="termy">

```shell
$ git clone https://github.com/dstackai/dstack
$ cd dstack
$ dstack init
```

</div>

Deployment

Here's an example of a service that deploys DeepSeek-R1-Distill-Llama 8B and 70B using SgLang.

=== "AMD"

<div editor-title="examples/deployment/sglang/amd/.dstack.yml">

```yaml
type: service
name: deepseek-r1-amd

image: lmsysorg/sglang:v0.4.1.post4-rocm620
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-70B
    
commands:
  - python3 -m sglang.launch_server
     --model-path $MODEL_ID
     --port 8000
     --trust-remote-code

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B

resources:
  gpu: MI300x
  disk: 300GB
```
</div>

=== "NVIDIA"

<div editor-title="examples/deployment/sglang/nvidia/.dstack.yml">

```yaml
type: service
name: deepseek-r1-nvidia

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B

commands:
  - python3 -m sglang.launch_server
     --model-path $MODEL_ID
     --port 8000
     --trust-remote-code

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

resources:
   gpu: 24GB
```
</div>

Applying the configuration

To run a configuration, use the dstack apply command.

$ dstack apply -f examples/llms/deepseek/sglang/amd/.dstack.yml

 #  BACKEND  REGION     RESOURCES                         SPOT  PRICE   
 1  runpod   EU-RO-1   24xCPU, 283GB, 1xMI300X (192GB)    no    $2.49  
    
Submit the run deepseek-r1-amd? [y/n]: y

Provisioning...
---> 100%

Once the service is up, the model will be available via the OpenAI-compatible endpoint at <dstack server URL>/proxy/models/<project name>/.

curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
    -X POST \
    -H 'Authorization: Bearer &lt;dstack token&gt;' \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "What is Deep Learning?"
        }
      ],
      "stream": true,
      "max_tokens": 512
    }'

When a gateway is configured, the OpenAI-compatible endpoint is available at https://gateway.<gateway domain>/.

Source code

The source-code of this example can be found in examples/llms/deepseek/sglang :material-arrow-top-right-thin:{ .external }{:target="_blank"}.

What's next?

Check services
Browse the SgLang DeepSeek Usage, Supercharge DeepSeek-R1 Inference on AMD Instinct MI300X

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sglang

sglang

README.md

SGLang

Deployment

Applying the configuration

Source code

What's next?

Files

sglang

Directory actions

More options

Directory actions

More options

Latest commit

History

sglang

Folders and files

parent directory

README.md

SGLang

Deployment

Applying the configuration

Source code

What's next?