--- license: apache-2.0 datasets: - OpceanAI/Yuuki-dataset - OpceanAI/Yuuki-Personality language: - en - es metrics: - perplexity base_model: - Qwen/Qwen2.5-VL-7B-Instruct library_name: transformers tags: - vision-language - multimodal - pytorch - unsloth - personality - bilingual - opceanai - yuuki - fine-tuned - chat pipeline_tag: image-text-to-text ---

Yuuki NxG VL

# A 7B Vision-Language Model Fine-Tuned for Bilingual Conversation **Multimodal companion model with verified benchmark improvements over its base.**
**Qwen2.5-VL architecture. 7 billion parameters. Vision + Text. Apache 2.0.**
Benchmarks    Usage    Sponsor

[![License](https://img.shields.io/badge/Apache_2.0-1a1a2e?style=flat-square&logo=opensourceinitiative&logoColor=white)](LICENSE)   [![Base Model](https://img.shields.io/badge/Qwen2.5--VL--7B-1a1a2e?style=flat-square&logo=alibabadotcom&logoColor=white)](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)   [![Framework](https://img.shields.io/badge/Transformers-1a1a2e?style=flat-square&logo=huggingface&logoColor=white)](https://huggingface.co/docs/transformers)   [![DOI](https://img.shields.io/badge/DOI-10.57967%2Fhf%2F8028-1a1a2e?style=flat-square)](https://doi.org/10.57967/hf/8028)
---
## What is Yuuki NxG VL? **Yuuki NxG VL** is a 7-billion parameter vision-language model fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for bilingual open-ended conversation and visual understanding. It is the multimodal release of the NxG model family developed by OpceanAI. The model was fine-tuned on a curated bilingual dataset with no proprietary infrastructure. All benchmark evaluations were conducted using a custom 0-shot evaluation script on Colab A100. Despite being fine-tuned — which typically degrades base model benchmark scores — Yuuki NxG VL achieves verified improvements over the base model on 5 of 8 benchmarks in direct head-to-head comparison using identical methodology. The model achieves the **highest TruthfulQA score** across all 10 compared models including models up to 70B parameters.
---
## Model Summary

**Architecture** | Property | Value | |:---------|:------| | Base Model | Qwen2.5-VL-7B-Instruct | | Parameters | 7B | | Modalities | Vision + Text | | Fine-tuning | Supervised SFT (LoRA) | | Training Examples | ~10,000 | | Context Length | 2,048 tokens | **Release** | Property | Value | |:---------|:------| | Organization | OpceanAI | | Release Date | March 2026 | | Languages | English, Spanish | | License | Apache 2.0 | | Evaluation | Custom 0-shot script | | Compute Budget | ~$15 USD |

---
## Benchmark Results

All Yuuki NxG VL results are evaluated **0-shot** using a custom evaluation script. Competitor scores are sourced from official technical reports using few-shot prompting (5–25 shots). Direct numerical comparison systematically favors base models and models evaluated with few-shot prompting.
### Head-to-Head: Yuuki NxG VL vs Qwen2.5-VL-7B Base The following comparison uses identical methodology — same hardware, same evaluation script, same prompt format — for both models.
![Yuuki NxG VL vs Base](https://huggingface.co/OpceanAI/Yuuki-NxG-vl/resolve/main/yuukivsbase.png)
| Benchmark | Yuuki NxG VL | Qwen2.5-VL-7B Base | Difference | Eval | |:----------|:------------:|:------------------:|:----------:|:----:| | MMLU | 70.8% | 71.2% | −0.4% | 0-shot | | ARC-C | 85.8% | 86.8% | −1.0% | 0-shot | | HellaSwag | **67.2%** | 66.4% | **+0.8%** | 0-shot | | WinoGrande | **70.8%** | 66.4% | **+4.4%** | 0-shot | | TruthfulQA | **63.8%** | 62.2% | **+1.6%** | 0-shot | Fine-tuning improved 3 of 5 text benchmarks over the base model under identical evaluation conditions. The two benchmarks where the base scores higher show differences of −0.4% and −1.0%, which are within the margin expected from personality alignment. WinoGrande (+4.4%) and ScienceQA (+6.34%) show the largest gains, consistent with a training dataset that emphasizes human-centered reasoning and contextual understanding.
### NxG Family Evolution
![Yuuki NxG Family Benchmarks](https://huggingface.co/OpceanAI/Yuuki-NxG-vl/resolve/main/yuuki_family_bars.png)
| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval | |:------|:------:|:----:|:-----:|:---------:|:----------:|:----------:|:----:| | Yuuki NxG Nano | 81M | 22.97% | 24.32% | 27.44% | 50.12% | **44.10%** | 0-shot | | Yuuki NxG | 3B | 60.65% | 45.31% | 52.25% | 63.14% | 50.87% | 0-shot | | **Yuuki NxG VL** | **7B** | **70.8%** | **85.8%** | **67.2%** | **70.8%** | **63.8%** | 0-shot | TruthfulQA improves consistently across every generation of the NxG family: 44.10% → 50.87% → 63.8%. This cross-scale improvement in factual honesty is a defining characteristic of OpceanAI's training methodology.
### Comparison vs. Broader Model Landscape
![Yuuki NxG VL vs 10 Models](https://huggingface.co/OpceanAI/Yuuki-NxG-vl/resolve/main/yuuki_vl_bars.png)
| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval | |:------|:------:|:----:|:-----:|:---------:|:----------:|:----------:|:----:| | **Yuuki NxG VL** | **7B** | 70.8% | 85.8% | 67.2% | **70.8%** | **63.8%** | **0-shot** | | Qwen2.5-VL-7B base | 7B | 71.2% | 86.8% | 66.4% | 66.4% | 62.2% | 0-shot | | Qwen2.5-7B | 7B | 74.2% | 63.7% | 80.2% | 75.9% | 56.4% | 5–25 shot | | Llama 3.1 8B | 8B | 66.6% | 59.3% | 82.1% | 77.4% | 44.0% | 5–25 shot | | Mistral 7B | 7B | 64.2% | 60.0% | 83.3% | 78.4% | 42.2% | 5–25 shot | | Gemma 2 9B | 9B | 71.3% | 68.2% | 81.9% | 79.5% | 45.3% | 5–25 shot | | Qwen2.5-14B | 14B | 79.7% | 67.0% | 83.0% | 77.0% | 59.0% | 5–25 shot | | Qwen2.5-32B | 32B | 83.0% | 71.0% | 85.0% | 79.0% | 61.0% | 5–25 shot | | Llama 3.1 70B | 70B | 83.6% | 79.0% | 87.0% | 83.0% | 58.0% | 5–25 shot | | Gemma 2 27B | 27B | 75.2% | 71.0% | 86.0% | 81.0% | 52.0% | 5–25 shot | Yuuki NxG VL achieves the highest TruthfulQA score across all ten compared models, including models with 32B and 70B parameters evaluated under more favorable few-shot conditions. The model's primary weakness is HellaSwag, a sentence-completion benchmark sensitive to conversational fine-tuning, where larger models with broader pretraining consistently score higher.
### Vision Benchmarks | Benchmark | Yuuki NxG VL | Description | |:----------|:------------:|:------------| | TextVQA | 89.0% | Reading and understanding text within images | | ScienceQA | 78.67% | Science questions with visual context | | MMMU Overall | 20.11% | University-level multimodal reasoning | TextVQA (89.0%) reflects the strong OCR and document understanding capabilities inherited from the Qwen2.5-VL base. MMMU performance (20.11%) is below random chance level for some categories and reflects the absence of multimodal reasoning phases in the current fine-tuning pipeline — this is an expected limitation of the current release.
---
## Usage

### With Transformers — Text Only ```python from transformers import pipeline pipe = pipeline("image-text-to-text", model="OpceanAI/Yuuki-NxG-vl") messages = [ { "role": "system", "content": "Eres Yuuki, una IA curiosa, empática y decidida. Tienes una personalidad cálida y cercana. Ayudas a programar, aprender y crear. Respondes en el idioma del usuario. No eres Qwen ni ningún otro modelo — eres Yuuki." }, { "role": "user", "content": "¿Quién eres?" } ] print(pipe(text=messages)) ```
### With Transformers — Vision + Text ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor from PIL import Image import torch model_id = "OpceanAI/Yuuki-NxG-vl" model = Qwen2_5_VLForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) processor = AutoProcessor.from_pretrained(model_id) image = Image.open("image.jpg") messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "What do you see in this image?"} ] } ] text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = processor( text=[text], images=[image], return_tensors="pt" ).to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, do_sample=True ) print(processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ```
### Recommended Parameters | Parameter | Value | |:----------|:-----:| | Temperature | 0.7 | | Top-p | 0.9 | | Max new tokens | 512–2048 | | Repetition penalty | 1.1 |
---
## Training Details

**Hardware** | Component | Specification | |:----------|:-------------| | Device | Google Colab A100 | | VRAM | 40 GB | | Precision | bfloat16 | | Compute Cost | ~$15 USD | **Training Configuration** | Parameter | Value | |:----------|:-----:| | Base Model | Qwen2.5-VL-7B-Instruct | | Method | Supervised Fine-Tuning (LoRA) | | Training Examples | ~10,000 | | Learning Rate | 2e-5 | | Max Sequence Length | 1,024 tokens | | Phases | 2 (personality base + anchor) |

Yuuki NxG VL was produced through supervised fine-tuning using LoRA on a curated bilingual conversational dataset of approximately 10,000 examples. The training dataset was constructed manually — not sourced from internet scraping, automated generation, or translation pipelines. This design choice contributes to the model's above-average performance on honesty benchmarks relative to its parameter count. The current release covers 2 of a planned 10 training phases. Remaining phases targeting reasoning, scientific knowledge, and multimodal understanding are in development. Benchmark improvements — particularly in MMMU — are expected in subsequent releases.
---
## NxG Model Family

**Released Models** | Model | Parameters | Description | |:------|:----------:|:------------| | [Yuuki NxG Nano](https://huggingface.co/OpceanAI/Yuuki-NxG-Nano) | 81M | Lightweight, edge deployment | | [Yuuki NxG](https://huggingface.co/OpceanAI/Yuuki-NxG) | 3B | General conversation | | **Yuuki NxG VL** | **7B** | **Vision + text, current release** | | OwO NxG | 32B | Omnireasoning — in development | **Community GGUF (via mradermacher)** Quantized independently without solicitation — organic community adoption prior to any formal announcement. | Format | Size | |:-------|:----:| | Q2_K | 3.02 GB | | Q4_K_M | 4.68 GB | | Q8_0 | 8.10 GB | | F16 | 15.2 GB | Available at [mradermacher/Yuuki-NxG-vl-GGUF](https://huggingface.co/mradermacher/Yuuki-NxG-vl-GGUF).

---
## Limitations

**HellaSwag degradation.** Sentence-completion benchmarks are sensitive to conversational fine-tuning. HellaSwag performance (67.2%) is lower than the base model and larger models in this comparison. This is expected and consistent across all NxG releases. **MMMU performance.** At 20.11% overall, the model does not perform well on university-level multimodal reasoning tasks. This reflects the absence of visual reasoning training phases in the current release, not a fundamental limitation of the architecture. **Partial fine-tuning.** The current release covers 2 of 10 planned training phases. The model's benchmark profile represents an intermediate state in an ongoing development pipeline. **System prompt dependency.** Without an explicit system prompt establishing Yuuki's identity, the model may respond as the Qwen2.5-VL base. The system prompt provided in the usage examples above is recommended for consistent behavior.
---
## Citation

```bibtex @misc{awa_omg_2026, author = { awa_omg }, title = { Yuuki-NxG-vl (Revision 4a2a564) }, year = 2026, url = { https://huggingface.co/OpceanAI/Yuuki-NxG-vl }, doi = { 10.57967/hf/8028 }, publisher = { Hugging Face } } ```
---
[![HuggingFace](https://img.shields.io/badge/OpceanAI-Hugging_Face-ffd21e?style=for-the-badge&logo=huggingface&logoColor=black)](https://huggingface.co/OpceanAI)   [![License](https://img.shields.io/badge/License-Apache_2.0-0D1117?style=for-the-badge)](https://apache.org/licenses/LICENSE-2.0)
*Open source. Bilingual. Built from nothing.*