vLLM

Topic	Replies	Views	Activity
Welcome to vLLM! :wave: General	1	308	March 24, 2025
What should be /dev/shm size for larger models General	0	1	April 7, 2025
Can Lora adapters be loaded on different GPUs LoRA	1	3	April 7, 2025
Grammar CPU bound performance General	1	10	April 7, 2025
Does VLLM support BERT model General	2	28	April 7, 2025
Making best use of varying GPU generations NVIDIA GPU Support	0	1	April 7, 2025
Conda and setup.py conflicting advice General	0	2	April 7, 2025
vLLM output vs Ollama General	5	43	April 7, 2025
Engine args ~deep-dive? General	1	5	April 7, 2025
Is is possible to initialize an AsyncLLMEngine inside the LLM object? General	0	5	April 7, 2025
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together General	0	6	April 6, 2025
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	3	16	April 6, 2025
How to get the dev version vllm docker image? General	1	15	April 5, 2025
I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this General	5	39	April 5, 2025
Running Gemma 3 on multi-chip TPU failure General	0	14	April 4, 2025
No HIP GPUs are available for VeRL verl	4	38	April 4, 2025
Numerical Difference between vLLM logprobs and huggingface logprobs Post-training	7	150	April 4, 2025
How to load the model successfully through multi-card in vllm? General	5	52	April 3, 2025
How to use vllm server in intranet General	5	38	April 2, 2025
Improving computing power at home for n00bs Hardware Support	7	51	April 2, 2025
Any plan to support an optimization about computation and communication overlapping General	2	13	April 2, 2025
vLLM V1 - Default max CUDA graph size V1 Feedback	0	39	April 1, 2025
Question about vLLM and vLLM Ascend verisoning policy Ascend Support	4	92	April 1, 2025
RL Training with vLLM Rollout: How to Mitigate Load Imbalance from Variable Response Lengths Post-training	4	98	April 1, 2025
Do we have regression tests for structured output? Especially speed regression? Structured Outputs	0	12	March 31, 2025
Why remove bonus token of requset in draft model? Speculative Decoding	0	16	March 30, 2025
Why vllm cannot fully use GPU in batch processing General	12	63	March 29, 2025
Jetson orin, CUDA error: no kernel image is available for execution on the device NVIDIA GPU Support	0	26	March 29, 2025
How to debug chat API General	0	22	March 28, 2025
Why zero_point is set False in gptq_marlin? General	0	11	March 28, 2025