You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it seems to be working fine output wise but for every turn in a multi turn chat environemnt, it seems to be re-encoding the image from scratch, leading to inefficiency in resource constrained environments. Not sure if this is the desired behaviour as according to my limited understanding, it should just reuse and only further augment the originally created embeddings via the projector?
Release: v1.82.4
Windows subsystem for linux with CPU backend.
The text was updated successfully, but these errors were encountered:
Upon starting the inference server with Qwen2VL via:
./koboldcpp --model "Qwen2-VL-2B-Instruct-Q6_K_L.gguf" --mmproj "mmproj-Qwen2-VL-2B-Instruct-f16.gguf"
it seems to be working fine output wise but for every turn in a multi turn chat environemnt, it seems to be re-encoding the image from scratch, leading to inefficiency in resource constrained environments. Not sure if this is the desired behaviour as according to my limited understanding, it should just reuse and only further augment the originally created embeddings via the projector?
Release: v1.82.4
Windows subsystem for linux with CPU backend.
The text was updated successfully, but these errors were encountered: