Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Latest commit

 

History

History

nim

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
title description
NVIDIA NIM
This example shows how to deploy DeepSeek-R1-Distill-Llama-8B to any cloud or on-premises environment using NVIDIA NIM and dstack.

NVIDIA NIM

This example shows how to deploy DeepSeek-R1-Distill-Llama-8B using NVIDIA NIM :material-arrow-top-right-thin:{ .external }{:target="_blank"} and dstack.

??? info "Prerequisites" Once dstack is installed, go ahead clone the repo, and run dstack init.

<div class="termy">

```shell
$ git clone https://github.com/dstackai/dstack
$ cd dstack
$ dstack init
```

</div>

Deployment

Here's an example of a service that deploys DeepSeek-R1-Distill-Llama-8B using NIM.

type: service
name: serve-distill-deepseek

image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
env:
  - NGC_API_KEY
  - NIM_MAX_MODEL_LEN=4096
registry_auth:
  username: $oauthtoken
  password: ${{ env.NGC_API_KEY }}
port: 8000
# Register the model
model: deepseek-ai/deepseek-r1-distill-llama-8b

# Uncomment to leverage spot instances
#spot_policy: auto

# Cache downloaded models
volumes:
  - instance_path: /root/.cache/nim
    path: /opt/nim/.cache
    optional: true

resources:
  gpu: A100:40GB
  # Uncomment if using multiple GPUs
  #shm_size: 16GB

Running a configuration

To run a configuration, use the dstack apply command.

$ NGC_API_KEY=...
$ dstack apply -f examples/deployment/nim/.dstack.yml

 #  BACKEND  REGION    RESOURCES                  SPOT  PRICE       
 1  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199   
 2  vultr    ewr       6xCPU, 60GB, 1xA100 (40GB) no    $1.199  
 3  vultr    nrt       6xCPU, 60GB, 1xA100 (40GB) no    $1.199 

Submit the run serve-distill-deepseek? [y/n]: y

Provisioning...
---> 100%

If no gateway is created, the model will be available via the OpenAI-compatible endpoint at <dstack server URL>/proxy/models/<project name>/.

$ curl http://127.0.0.1:3000/proxy/models/main/chat/completions \
    -X POST \
    -H 'Authorization: Bearer &lt;dstack token&gt;' \
    -H 'Content-Type: application/json' \
    -d '{
      "model": "meta/llama3-8b-instruct",
      "messages": [
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "What is Deep Learning?"
        }
      ],
      "max_tokens": 128
    }'

When a gateway is configured, the OpenAI-compatible endpoint is available at https://gateway.<gateway domain>/.

Source code

The source-code of this example can be found in examples/deployment/nim :material-arrow-top-right-thin:{ .external }{:target="_blank"}.

What's next?

  1. Check services
  2. Browse the DeepSeek AI NIM