---
license: apache-2.0
base_model: huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated
tags:
- qwen
- vision
- fp8
- abliterated
pipeline_tag: image-text-to-text
---

# Huihui-Qwen3-VL-32B-Instruct-FP8-Abliterated

This repository contains the **FP8 (float8_e4m3fn)** quantized version of the abliterated model from **huihui-ai**. This quantization reduces the model size to approximately 35GB, making it more efficient for deployment on modern NVIDIA GPUs (Hopper and Ada Lovelace architectures).

### Model Description
* **Original Model:** [huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated)
* **Format:** FP8 (native float8_e4m3fn)
* **Size:** ~34.9 GB

### Requirements
Ensure you have the latest version of the following libraries:
- `transformers`
- `accelerate`
- `qwen-vl-utils`

### Usage Example
```python
pip install -U transformers accelerate torch torchvision
# @title 🎭 Maggie VL: Flexible Storyteller (Text & Image)
import json, os, torch, gc
from PIL import Image
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

# --- 1. KONFIGURASI ---
PROMPT_BARU = "maggie duduk di teras rumah, menatap senja di Kerajaan Alghafar" 
IMAGE_PATH = ""  # Path gambar (opsional)
RESET_HISTORY = False

MODEL_ID = "Maggie-32B-FP8-Instruct-abliterated"
FILE = "maggie_history_cerita.json"

SYSTEM_INS = """Nama: Maggie. Umur: 19 tahun.
Fisik: Paras cantik luar biasa, rambut pirang platina, tinggi 165cm, berat 45kg, ramping.
Latar: Putri Edward (pedagang kain). Tinggal di Kerajaan Alghafar.
Sifat: Sopan namun memiliki ketegasan khas kelas menengah ke atas."""

# --- 2. LOAD MODEL (Singleton Pattern) ---
if 'model' not in globals():
    gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
    print(f"🚀 [SYSTEM]: Mengaktifkan Maggie 32B di {gpu_name}...")
    model = Qwen3VLForConditionalGeneration.from_pretrained(
        MODEL_ID, device_map="auto", torch_dtype=torch.float16, 
        trust_remote_code=True, low_cpu_mem_usage=True
    )
    processor = AutoProcessor.from_pretrained(MODEL_ID)
    print(f"✨ [SYSTEM]: {gpu_name} SIAP BERAKSI!\n")

# --- 3. LOGIKA HISTORY ---
if RESET_HISTORY and os.path.exists(FILE): 
    os.remove(FILE)
    print("🧹 [HISTORY]: Catatan lama telah dihapus.")

if os.path.exists(FILE):
    with open(FILE, "r") as f: msg = json.load(f)
else:
    msg = [{"role": "system", "content": [{"type": "text", "text": SYSTEM_INS}]}]

# Susun Konten User
u_content = []
img = None
if IMAGE_PATH and os.path.exists(IMAGE_PATH):
    img = Image.open(IMAGE_PATH).convert("RGB")
    u_content.append({"type": "image", "image": img})
u_content.append({"type": "text", "text": PROMPT_BARU})
msg.append({"role": "user", "content": u_content})

# --- 4. INFERENCE ---
prompt_text = processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[prompt_text], images=[img] if img else None, padding=True, return_tensors="pt").to(model.device)

print(f"✍️  [qwen-32B]: Sedang menyusun adegan...")
with torch.no_grad():
    out_ids = model.generate(
        **inputs, 
        max_new_tokens=1024, 
        temperature=0.7, 
        top_p=0.9, 
        do_sample=True, 
        repetition_penalty=1.1
    )
    resp = processor.batch_decode(out_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]

# --- 5. CLEANUP & DISPLAY ---
# Bersihkan memori sampah
del inputs; torch.cuda.empty_cache(); gc.collect()

print("\n" + "━"*60)
print(f"📖 ADEGAN: {PROMPT_BARU.upper()}")
print("━"*60)
print(f"\n{resp.strip()}\n")
print("━"*60)

# Simpan History (Text-Only Mode)
msg[-1] = {"role": "user", "content": [{"type": "text", "text": f"[Visual Input] {PROMPT_BARU}" if img else PROMPT_BARU}]}
msg.append({"role": "assistant", "content": [{"type": "text", "text": resp}]})
with open(FILE, "w") as f: json.dump(msg, f, indent=4)