16GB of VRAM is no enough.

by junguang - opened about 24 hours ago

The speed is very fast—faster than GGUF Q8 and comparable to Q4. However, using this model with a standard LoRA (a large-sized one- 1.5g) puts a strain on the 16GB of VRAM; CF offloads the load to system RAM to compensate, causing generation time to exceed 300 seconds. Is there a tool available to convert the LoRA into INT8 format as well?

Sakujo

about 10 hours ago

Check if you can find another workflow, yours seam broken. I have a 16gb card too and can use a lot of lora at the same time and it adds maybe 10 seconds extra on 5+ lora.

Winnougan

Owner about 8 hours ago

Works on 8GB of vram

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment