--- license: apache-2.0 base_model: huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated tags: - qwen - vision - fp8 - abliterated pipeline_tag: image-text-to-text --- # Huihui-Qwen3-VL-32B-Instruct-FP8-Abliterated This repository contains the **FP8 (float8_e4m3fn)** quantized version of the abliterated model from **huihui-ai**. This quantization reduces the model size to approximately 35GB, making it more efficient for deployment on modern NVIDIA GPUs (Hopper and Ada Lovelace architectures). ### Model Description * **Original Model:** [huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated) * **Format:** FP8 (native float8_e4m3fn) * **Size:** ~34.9 GB ### Requirements Ensure you have the latest version of the following libraries: - `transformers` - `accelerate` - `qwen-vl-utils` ### Usage Example ```python pip install -U transformers accelerate torch torchvision # @title ๐ŸŽญ Maggie VL: Flexible Storyteller (Text & Image) import json, os, torch, gc from PIL import Image from transformers import Qwen3VLForConditionalGeneration, AutoProcessor # --- 1. KONFIGURASI --- PROMPT_BARU = "maggie duduk di teras rumah, menatap senja di Kerajaan Alghafar" IMAGE_PATH = "" # Path gambar (opsional) RESET_HISTORY = False MODEL_ID = "Maggie-32B-FP8-Instruct-abliterated" FILE = "maggie_history_cerita.json" SYSTEM_INS = """Nama: Maggie. Umur: 19 tahun. Fisik: Paras cantik luar biasa, rambut pirang platina, tinggi 165cm, berat 45kg, ramping. Latar: Putri Edward (pedagang kain). Tinggal di Kerajaan Alghafar. Sifat: Sopan namun memiliki ketegasan khas kelas menengah ke atas.""" # --- 2. LOAD MODEL (Singleton Pattern) --- if 'model' not in globals(): gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU" print(f"๐Ÿš€ [SYSTEM]: Mengaktifkan Maggie 32B di {gpu_name}...") model = Qwen3VLForConditionalGeneration.from_pretrained( MODEL_ID, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True, low_cpu_mem_usage=True ) processor = AutoProcessor.from_pretrained(MODEL_ID) print(f"โœจ [SYSTEM]: {gpu_name} SIAP BERAKSI!\n") # --- 3. LOGIKA HISTORY --- if RESET_HISTORY and os.path.exists(FILE): os.remove(FILE) print("๐Ÿงน [HISTORY]: Catatan lama telah dihapus.") if os.path.exists(FILE): with open(FILE, "r") as f: msg = json.load(f) else: msg = [{"role": "system", "content": [{"type": "text", "text": SYSTEM_INS}]}] # Susun Konten User u_content = [] img = None if IMAGE_PATH and os.path.exists(IMAGE_PATH): img = Image.open(IMAGE_PATH).convert("RGB") u_content.append({"type": "image", "image": img}) u_content.append({"type": "text", "text": PROMPT_BARU}) msg.append({"role": "user", "content": u_content}) # --- 4. INFERENCE --- prompt_text = processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) inputs = processor(text=[prompt_text], images=[img] if img else None, padding=True, return_tensors="pt").to(model.device) print(f"โœ๏ธ [qwen-32B]: Sedang menyusun adegan...") with torch.no_grad(): out_ids = model.generate( **inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.1 ) resp = processor.batch_decode(out_ids[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0] # --- 5. CLEANUP & DISPLAY --- # Bersihkan memori sampah del inputs; torch.cuda.empty_cache(); gc.collect() print("\n" + "โ”"*60) print(f"๐Ÿ“– ADEGAN: {PROMPT_BARU.upper()}") print("โ”"*60) print(f"\n{resp.strip()}\n") print("โ”"*60) # Simpan History (Text-Only Mode) msg[-1] = {"role": "user", "content": [{"type": "text", "text": f"[Visual Input] {PROMPT_BARU}" if img else PROMPT_BARU}]} msg.append({"role": "assistant", "content": [{"type": "text", "text": resp}]}) with open(FILE, "w") as f: json.dump(msg, f, indent=4)