Instructions to use Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Local Apps Settings

How to use Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM

SGLang

How to use Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM",
    max_seq_length=2048,
)

Docker Model Runner
How to use Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM with Docker Model Runner:
```
docker model run hf.co/Xingyu-Zheng/Qwopus3.5-27B-v3.5-INT4-FOEM
```

🌟Qwopus3.5-27B-v3.5-INT4-FOEM

This is an unofficial quantized version of Qwopus3.5-27B-v3.5.

🧠 Quantization Framework

GPTQModel

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Example evaluation command:

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwopus3.5-27B-v3.5-INT4-FOEM,tensor_parallel_size=1,gpu_memory_utilization=0.45 --tasks wikitext --batch_size 1

⚠️ Limitations & Intended Use

(Adapted from the original repository of Jackrong/Qwopus3.5-27B-v3.5)

Possible overfitting if scaling exceeds optimal regime
Reasoning may still exhibit instability in edge cases
Tool-calling performance depends on environment integration
Not all capabilities are fully benchmarked yet

🙏 Acknowledgements

Special thanks to Jackrong for providing the original model: Qwopus3.5-27B-v3.5.

📖 Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwopus35_v35,
  title        = {Qwopus3.5-27B-v3.5},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face}
}

@misc{qubitium2024gptqmodel,
  author = {ModelCloud.ai and qubitium@modelcloud.ai},
  title = {GPT-QModel},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/modelcloud/gptqmodel}},
  note = {Contact: qubitium@modelcloud.ai},
  year = {2024},
}

@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}