KTransformers + SGLang

#1
by mtcl - opened

Do you think this will work with KTransformers + SGLang mixed CPU GPU inferencing?

Phala org

Yes, it should be. I never tested KTransformers on this checkpoint though. The exact GPU kernel and the the expected data types may vary depending on the serving engine.

For example, the checkpoint doesn't directly work with vLLM because it expect a format of W4A8 with channel scalers. The AWQ weights are bit-by-bit identical, but the representation of activations needs to converted. This is specifically for vLLM. Not sure how it fits in KTransformers.

Sign up or log in to comment