[Bug]: DeepSeek-R1-AWQ gets stuck with all tokens rejected when MTP is enabled. #13704
Open
1 task done
Labels
bug
Something isn't working
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
Run command:
Symptom:
Upon receiving a request, only the first word (for example "Okay") is generated, then the generation is stuck and no new tokens are streamed.
As can see from the console log, number of accepted tokens remains 0 while number draft tokens increases.
After removing
--num-speculative-tokens 1
vllm works fine.Before submitting a new issue...
The text was updated successfully, but these errors were encountered: