Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: DeepSeek-R1-AWQ gets stuck with all tokens rejected when MTP is enabled. #13704

Open
1 task done
sgsdxzy opened this issue Feb 22, 2025 · 0 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@sgsdxzy
Copy link

sgsdxzy commented Feb 22, 2025

Your current environment

The output of `python collect_env.py`

Image

Image

🐛 Describe the bug

Run command:

vllm serve --enable-chunked-prefill --enable-prefix-caching -tp 8 cognitivecomputations/DeepSeek-R1-AWQ --dtype float16 --trust-remote-code --max-model-len 131072 --max-seq-len-to-capture 131072 --num-speculative-tokens 1

Symptom:
Upon receiving a request, only the first word (for example "Okay") is generated, then the generation is stuck and no new tokens are streamed.

Image

As can see from the console log, number of accepted tokens remains 0 while number draft tokens increases.

After removing --num-speculative-tokens 1 vllm works fine.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@sgsdxzy sgsdxzy added the bug Something isn't working label Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant