---
license: apache-2.0
datasets:
- OpceanAI/Yuuki-dataset
- OpceanAI/Yuuki-Personality
language:
- en
- es
metrics:
- perplexity
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
library_name: transformers
tags:
- vision-language
- multimodal
- pytorch
- unsloth
- personality
- bilingual
- opceanai
- yuuki
- fine-tuned
- chat
pipeline_tag: image-text-to-text
---
# A 7B Vision-Language Model Fine-Tuned for Bilingual Conversation
**Multimodal companion model with verified benchmark improvements over its base.**
**Qwen2.5-VL architecture. 7 billion parameters. Vision + Text. Apache 2.0.**
[](LICENSE)
[](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
[](https://huggingface.co/docs/transformers)
[](https://doi.org/10.57967/hf/8028)
---
## What is Yuuki NxG VL?
**Yuuki NxG VL** is a 7-billion parameter vision-language model fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for bilingual open-ended conversation and visual understanding. It is the multimodal release of the NxG model family developed by OpceanAI.
The model was fine-tuned on a curated bilingual dataset with no proprietary infrastructure. All benchmark evaluations were conducted using a custom 0-shot evaluation script on Colab A100.
Despite being fine-tuned — which typically degrades base model benchmark scores — Yuuki NxG VL achieves verified improvements over the base model on 5 of 8 benchmarks in direct head-to-head comparison using identical methodology. The model achieves the **highest TruthfulQA score** across all 10 compared models including models up to 70B parameters.
---
## Model Summary
|
**Architecture**
| Property | Value |
|:---------|:------|
| Base Model | Qwen2.5-VL-7B-Instruct |
| Parameters | 7B |
| Modalities | Vision + Text |
| Fine-tuning | Supervised SFT (LoRA) |
| Training Examples | ~10,000 |
| Context Length | 2,048 tokens |
|
**Release**
| Property | Value |
|:---------|:------|
| Organization | OpceanAI |
| Release Date | March 2026 |
| Languages | English, Spanish |
| License | Apache 2.0 |
| Evaluation | Custom 0-shot script |
| Compute Budget | ~$15 USD |
|
---
## Benchmark Results
All Yuuki NxG VL results are evaluated **0-shot** using a custom evaluation script. Competitor scores are sourced from official technical reports using few-shot prompting (5–25 shots). Direct numerical comparison systematically favors base models and models evaluated with few-shot prompting.
### Head-to-Head: Yuuki NxG VL vs Qwen2.5-VL-7B Base
The following comparison uses identical methodology — same hardware, same evaluation script, same prompt format — for both models.

| Benchmark | Yuuki NxG VL | Qwen2.5-VL-7B Base | Difference | Eval |
|:----------|:------------:|:------------------:|:----------:|:----:|
| MMLU | 70.8% | 71.2% | −0.4% | 0-shot |
| ARC-C | 85.8% | 86.8% | −1.0% | 0-shot |
| HellaSwag | **67.2%** | 66.4% | **+0.8%** | 0-shot |
| WinoGrande | **70.8%** | 66.4% | **+4.4%** | 0-shot |
| TruthfulQA | **63.8%** | 62.2% | **+1.6%** | 0-shot |
Fine-tuning improved 3 of 5 text benchmarks over the base model under identical evaluation conditions. The two benchmarks where the base scores higher show differences of −0.4% and −1.0%, which are within the margin expected from personality alignment. WinoGrande (+4.4%) and ScienceQA (+6.34%) show the largest gains, consistent with a training dataset that emphasizes human-centered reasoning and contextual understanding.
### NxG Family Evolution

| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval |
|:------|:------:|:----:|:-----:|:---------:|:----------:|:----------:|:----:|
| Yuuki NxG Nano | 81M | 22.97% | 24.32% | 27.44% | 50.12% | **44.10%** | 0-shot |
| Yuuki NxG | 3B | 60.65% | 45.31% | 52.25% | 63.14% | 50.87% | 0-shot |
| **Yuuki NxG VL** | **7B** | **70.8%** | **85.8%** | **67.2%** | **70.8%** | **63.8%** | 0-shot |
TruthfulQA improves consistently across every generation of the NxG family: 44.10% → 50.87% → 63.8%. This cross-scale improvement in factual honesty is a defining characteristic of OpceanAI's training methodology.
### Comparison vs. Broader Model Landscape

| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval |
|:------|:------:|:----:|:-----:|:---------:|:----------:|:----------:|:----:|
| **Yuuki NxG VL** | **7B** | 70.8% | 85.8% | 67.2% | **70.8%** | **63.8%** | **0-shot** |
| Qwen2.5-VL-7B base | 7B | 71.2% | 86.8% | 66.4% | 66.4% | 62.2% | 0-shot |
| Qwen2.5-7B | 7B | 74.2% | 63.7% | 80.2% | 75.9% | 56.4% | 5–25 shot |
| Llama 3.1 8B | 8B | 66.6% | 59.3% | 82.1% | 77.4% | 44.0% | 5–25 shot |
| Mistral 7B | 7B | 64.2% | 60.0% | 83.3% | 78.4% | 42.2% | 5–25 shot |
| Gemma 2 9B | 9B | 71.3% | 68.2% | 81.9% | 79.5% | 45.3% | 5–25 shot |
| Qwen2.5-14B | 14B | 79.7% | 67.0% | 83.0% | 77.0% | 59.0% | 5–25 shot |
| Qwen2.5-32B | 32B | 83.0% | 71.0% | 85.0% | 79.0% | 61.0% | 5–25 shot |
| Llama 3.1 70B | 70B | 83.6% | 79.0% | 87.0% | 83.0% | 58.0% | 5–25 shot |
| Gemma 2 27B | 27B | 75.2% | 71.0% | 86.0% | 81.0% | 52.0% | 5–25 shot |
Yuuki NxG VL achieves the highest TruthfulQA score across all ten compared models, including models with 32B and 70B parameters evaluated under more favorable few-shot conditions. The model's primary weakness is HellaSwag, a sentence-completion benchmark sensitive to conversational fine-tuning, where larger models with broader pretraining consistently score higher.
### Vision Benchmarks
| Benchmark | Yuuki NxG VL | Description |
|:----------|:------------:|:------------|
| TextVQA | 89.0% | Reading and understanding text within images |
| ScienceQA | 78.67% | Science questions with visual context |
| MMMU Overall | 20.11% | University-level multimodal reasoning |
TextVQA (89.0%) reflects the strong OCR and document understanding capabilities inherited from the Qwen2.5-VL base. MMMU performance (20.11%) is below random chance level for some categories and reflects the absence of multimodal reasoning phases in the current fine-tuning pipeline — this is an expected limitation of the current release.
---
## Usage
### With Transformers — Text Only
```python
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="OpceanAI/Yuuki-NxG-vl")
messages = [
{
"role": "system",
"content": "Eres Yuuki, una IA curiosa, empática y decidida. Tienes una personalidad cálida y cercana. Ayudas a programar, aprender y crear. Respondes en el idioma del usuario. No eres Qwen ni ningún otro modelo — eres Yuuki."
},
{
"role": "user",
"content": "¿Quién eres?"
}
]
print(pipe(text=messages))
```
### With Transformers — Vision + Text
```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch
model_id = "OpceanAI/Yuuki-NxG-vl"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)
image = Image.open("image.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "What do you see in this image?"}
]
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = processor(
text=[text],
images=[image],
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True
)
print(processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
### Recommended Parameters
| Parameter | Value |
|:----------|:-----:|
| Temperature | 0.7 |
| Top-p | 0.9 |
| Max new tokens | 512–2048 |
| Repetition penalty | 1.1 |
---
## Training Details
|
**Hardware**
| Component | Specification |
|:----------|:-------------|
| Device | Google Colab A100 |
| VRAM | 40 GB |
| Precision | bfloat16 |
| Compute Cost | ~$15 USD |
|
**Training Configuration**
| Parameter | Value |
|:----------|:-----:|
| Base Model | Qwen2.5-VL-7B-Instruct |
| Method | Supervised Fine-Tuning (LoRA) |
| Training Examples | ~10,000 |
| Learning Rate | 2e-5 |
| Max Sequence Length | 1,024 tokens |
| Phases | 2 (personality base + anchor) |
|
Yuuki NxG VL was produced through supervised fine-tuning using LoRA on a curated bilingual conversational dataset of approximately 10,000 examples. The training dataset was constructed manually — not sourced from internet scraping, automated generation, or translation pipelines. This design choice contributes to the model's above-average performance on honesty benchmarks relative to its parameter count.
The current release covers 2 of a planned 10 training phases. Remaining phases targeting reasoning, scientific knowledge, and multimodal understanding are in development. Benchmark improvements — particularly in MMMU — are expected in subsequent releases.
---
## NxG Model Family
|
**Released Models**
| Model | Parameters | Description |
|:------|:----------:|:------------|
| [Yuuki NxG Nano](https://huggingface.co/OpceanAI/Yuuki-NxG-Nano) | 81M | Lightweight, edge deployment |
| [Yuuki NxG](https://huggingface.co/OpceanAI/Yuuki-NxG) | 3B | General conversation |
| **Yuuki NxG VL** | **7B** | **Vision + text, current release** |
| OwO NxG | 32B | Omnireasoning — in development |
|
**Community GGUF (via mradermacher)**
Quantized independently without solicitation — organic community adoption prior to any formal announcement.
| Format | Size |
|:-------|:----:|
| Q2_K | 3.02 GB |
| Q4_K_M | 4.68 GB |
| Q8_0 | 8.10 GB |
| F16 | 15.2 GB |
Available at [mradermacher/Yuuki-NxG-vl-GGUF](https://huggingface.co/mradermacher/Yuuki-NxG-vl-GGUF).
|
---
## Limitations
**HellaSwag degradation.** Sentence-completion benchmarks are sensitive to conversational fine-tuning. HellaSwag performance (67.2%) is lower than the base model and larger models in this comparison. This is expected and consistent across all NxG releases.
**MMMU performance.** At 20.11% overall, the model does not perform well on university-level multimodal reasoning tasks. This reflects the absence of visual reasoning training phases in the current release, not a fundamental limitation of the architecture.
**Partial fine-tuning.** The current release covers 2 of 10 planned training phases. The model's benchmark profile represents an intermediate state in an ongoing development pipeline.
**System prompt dependency.** Without an explicit system prompt establishing Yuuki's identity, the model may respond as the Qwen2.5-VL base. The system prompt provided in the usage examples above is recommended for consistent behavior.
---
## Citation
```bibtex
@misc{awa_omg_2026,
author = { awa_omg },
title = { Yuuki-NxG-vl (Revision 4a2a564) },
year = 2026,
url = { https://huggingface.co/OpceanAI/Yuuki-NxG-vl },
doi = { 10.57967/hf/8028 },
publisher = { Hugging Face }
}
```
---
[](https://huggingface.co/OpceanAI)
[](https://apache.org/licenses/LICENSE-2.0)
*Open source. Bilingual. Built from nothing.*