Welcome to vLLM! :wave:
|
|
1
|
308
|
March 24, 2025
|
What should be /dev/shm size for larger models
|
|
0
|
1
|
April 7, 2025
|
Can Lora adapters be loaded on different GPUs
|
|
1
|
3
|
April 7, 2025
|
Grammar CPU bound performance
|
|
1
|
10
|
April 7, 2025
|
Does VLLM support BERT model
|
|
2
|
28
|
April 7, 2025
|
Making best use of varying GPU generations
|
|
0
|
1
|
April 7, 2025
|
Conda and setup.py conflicting advice
|
|
0
|
2
|
April 7, 2025
|
vLLM output vs Ollama
|
|
5
|
43
|
April 7, 2025
|
Engine args ~deep-dive?
|
|
1
|
5
|
April 7, 2025
|
Is is possible to initialize an AsyncLLMEngine inside the LLM object?
|
|
0
|
5
|
April 7, 2025
|
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together
|
|
0
|
6
|
April 6, 2025
|
How to setup amd gpu as default in dual stack gpu?
|
|
3
|
16
|
April 6, 2025
|
How to get the dev version vllm docker image?
|
|
1
|
15
|
April 5, 2025
|
I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this
|
|
5
|
39
|
April 5, 2025
|
Running Gemma 3 on multi-chip TPU failure
|
|
0
|
14
|
April 4, 2025
|
No HIP GPUs are available for VeRL
|
|
4
|
38
|
April 4, 2025
|
Numerical Difference between vLLM logprobs and huggingface logprobs
|
|
7
|
150
|
April 4, 2025
|
How to load the model successfully through multi-card in vllm?
|
|
5
|
52
|
April 3, 2025
|
How to use vllm server in intranet
|
|
5
|
38
|
April 2, 2025
|
Improving computing power at home for n00bs
|
|
7
|
51
|
April 2, 2025
|
Any plan to support an optimization about computation and communication overlapping
|
|
2
|
13
|
April 2, 2025
|
vLLM V1 - Default max CUDA graph size
|
|
0
|
39
|
April 1, 2025
|
Question about vLLM and vLLM Ascend verisoning policy
|
|
4
|
92
|
April 1, 2025
|
RL Training with vLLM Rollout: How to Mitigate Load Imbalance from Variable Response Lengths
|
|
4
|
98
|
April 1, 2025
|
Do we have regression tests for structured output? Especially speed regression?
|
|
0
|
12
|
March 31, 2025
|
Why remove bonus token of requset in draft model?
|
|
0
|
16
|
March 30, 2025
|
Why vllm cannot fully use GPU in batch processing
|
|
12
|
63
|
March 29, 2025
|
Jetson orin, CUDA error: no kernel image is available for execution on the device
|
|
0
|
26
|
March 29, 2025
|
How to debug chat API
|
|
0
|
22
|
March 28, 2025
|
Why zero_point is set False in gptq_marlin?
|
|
0
|
11
|
March 28, 2025
|