Instructions to use aitf-ub-2026/ub-sr04-qwen3.5-4b-cpt2-sft-game with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use aitf-ub-2026/ub-sr04-qwen3.5-4b-cpt2-sft-game with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aitf-ub-2026/ub-sr04-qwen3.5-4b-cpt2-sft-game to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for aitf-ub-2026/ub-sr04-qwen3.5-4b-cpt2-sft-game to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for aitf-ub-2026/ub-sr04-qwen3.5-4b-cpt2-sft-game to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="aitf-ub-2026/ub-sr04-qwen3.5-4b-cpt2-sft-game", max_seq_length=2048, )
Panduan Deployment SR4 di RunPod
Model ini (ub-sr04-qwen3.5-4b-cpt2-sft-game) adalah LLM game content generator untuk platform Sekolah Rakyat. Dijalankan sebagai inference server OpenAI-compatible menggunakan Unsloth — bukan vLLM (arsitektur hybrid Qwen3 tidak kompatibel dengan vLLM).
File
server.pysudah tersedia di repo ini — ikut ter-download bersama model.
Kebutuhan Hardware
| GPU | VRAM | Mode |
|---|---|---|
| A100 40GB | 40 GB | bfloat16 (~9 GB dipakai) |
| L4 24GB | 24 GB | bfloat16 atau 4-bit (~3 GB) |
| T4 16GB | 16 GB | Harus 4-bit |
Storage: minimal 20 GB · RAM: minimal 16 GB CPU
Setup di RunPod
1. Buat Pod
Di runpod.io → Deploy:
- Template: RunPod PyTorch
- Container Disk: 20 GB · Volume: 20 GB (mount ke
/workspace) - Expose port yang diinginkan (default:
8081)
2. Download Model + Server Script
huggingface-cli login # butuh token HF dengan akses ke repo ini
huggingface-cli download aitf-ub-2026/ub-sr04-qwen3.5-4b-cpt2-sft-game \
--local-dir /workspace/models/sr4
server.py ikut ter-download ke /workspace/models/sr4/server.py.
Model tersimpan di volume persistent — tidak perlu download ulang setelah restart pod.
3. Install Dependencies
pip install --upgrade pip
pip install unsloth unsloth_zoo accelerate bitsandbytes
pip install fastapi "uvicorn[standard]"
pip install git+https://github.com/huggingface/transformers.git
torchsudah tersedia di RunPod PyTorch template — tidak perlu install ulang.bitsandbytesdiperlukan untuk mode--4bit.
4. Jalankan Server
# bfloat16 (default)
python /workspace/models/sr4/server.py --model /workspace/models/sr4 --port 8081
# 4-bit — untuk GPU VRAM terbatas (L4, T4)
python /workspace/models/sr4/server.py --model /workspace/models/sr4 --port 8081 --4bit
Server siap saat muncul log:
[SR4] Model loaded — GPU X.X / XX.X GB
[SR4] Serving on http://0.0.0.0:8081
Auto-start setelah Restart
RunPod tidak auto-restart service. Tambahkan ke crontab:
crontab -e
# tambahkan:
@reboot sleep 30 && python /workspace/models/sr4/server.py --model /workspace/models/sr4 --port 8081
API
Endpoints
GET /health → {"status": "ok", "model": "..."}
GET /v1/models → daftar model tersedia
POST /v1/chat/completions → generate (OpenAI-compatible)
Contoh Request
curl -X POST http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "sr4-game",
"messages": [
{"role": "system", "content": "...system prompt..."},
{"role": "user", "content": "{\"difficulty\": 1, \"atps\": [...], \"bacaan\": \"...\"}"}
],
"max_tokens": 3500,
"temperature": 0.0
}'
Response field choices[0].message.content berisi game JSON (sudah di-strip dari markdown fence dan thinking block).
Catatan
- Chat template: ChatML (
<|im_start|>/<|im_end|>) - Max seq length: 4096 token
- Thinking dinonaktifkan — output langsung JSON tanpa blok
<think> - Log "MISSING: model.visual.*" dari Unsloth adalah normal — model ini pure text, tidak ada vision encoder yang aktif saat inference
- Downloads last month
- 96