PhalaCloud/GLM-5.2-W4AFP8 · KTransformers + SGLang

KTransformers + SGLang

by mtcl - opened 6 days ago

Discussion

mtcl

6 days ago

Do you think this will work with KTransformers + SGLang mixed CPU GPU inferencing?

h4x3rotab

Phala org 6 days ago

Yes, it should be. I never tested KTransformers on this checkpoint though. The exact GPU kernel and the the expected data types may vary depending on the serving engine.

For example, the checkpoint doesn't directly work with vLLM because it expect a format of W4A8 with channel scalers. The AWQ weights are bit-by-bit identical, but the representation of activations needs to converted. This is specifically for vLLM. Not sure how it fits in KTransformers.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment