16GB of VRAM is no enough.

#7
by junguang - opened

The speed is very fast—faster than GGUF Q8 and comparable to Q4. However, using this model with a standard LoRA (a large-sized one- 1.5g) puts a strain on the 16GB of VRAM; CF offloads the load to system RAM to compensate, causing generation time to exceed 300 seconds. Is there a tool available to convert the LoRA into INT8 format as well?

Check if you can find another workflow, yours seam broken. I have a 16gb card too and can use a lot of lora at the same time and it adds maybe 10 seconds extra on 5+ lora.

Works on 8GB of vram

Sign up or log in to comment