How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Quick Links

NOESIS-Qwen3-VL-2B-UI-Venus-NF4 (NOESIS DHCF-FNO bundle)

NF4 quantization derivative of inclusionAI/UI-Venus-1.5-2B — end-to-end GUI agent trained via 4-stage post-training pipeline (Mid-Train → Offline-RL → Online-RL → Model-Merge) on top of Qwen3-VL-2B. NF4-quantized via bitsandbytes 0.49.2 (double_quant + bf16 compute) from the intermediate UI-Venus-1.5-2B-NOESIS-BF16 AMAImedia BF16 repack sibling.

Used inside the NOESIS DHCF-FNO stack as the FALLBACK 2B agent on the public ui-agent.amaimedia.com subdomain (browser DOM automation), providing alternative-pipeline cross-validation against the PRIMARY MAI-UI 8B NF4 path per R-AGENT-PRIMARY-MAI-UI-8B-NF4.

APACHE 2.0 — COMMERCIAL USE PERMITTED. End-to-end clean lineage (Alibaba Cloud / Qwen Team Apache 2.0 → Inclusion AI / Ant Group Venus Team Apache 2.0 → AMAImedia BF16 repack Apache 2.0 → AMAImedia NF4 Apache 2.0). Standard transformers.from_pretrained loading with device_map={"": 0} (NF4 requirement per CLAUDE.md GOLDEN RULE 2).

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).

  • Founder: Ilia Bolotnikov
  • Organization: AMAImedia.com
  • X (Twitter): @AMAImediacom
  • LinkedIn: Ilia Bolotnikov
  • Telegram: @AMAImediacom
  • NOESIS version: v15.8
  • Quantization date: 2026-05-21 08:32:59
  • Renamed from: NOESIS-UI-Venus-2B-NF4 (per R-NOESIS-FOLDER-NAMING-PREFIX-QWEN3-VL)

NOESIS role — fallback 2B agent on ui-agent.amaimedia.com

Browser DOM automation agent mounted on ui-agent.amaimedia.com (Phase 2 desktop agent / auto-clipper UI nav subdomain) as the FALLBACK tier in a 3-tier hierarchy. Provides cross-validation against MAI-UI's grounding output through a fundamentally different training pipeline (4-stage RFT vs self-evolving data + device-cloud collab).

ui-agent.amaimedia.com (browser DOM automation)
        │
        ├── PRIMARY  : NOESIS-Qwen3-VL-8B-MAI-UI-NF4    (Tongyi MAI-UI 8B, ~5 GB VRAM)
        │              R-AGENT-PRIMARY-MAI-UI-8B-NF4
        │
        ├── SECONDARY: NOESIS-Qwen3-VL-2B-MAI-UI-NF4    (Tongyi MAI-UI 2B, ~1.6 GB VRAM)
        │              low-VRAM fallback for primary
        │
        └── FALLBACK : NOESIS-Qwen3-VL-2B-UI-Venus-NF4  (this, Inclusion AI Venus 1.5 2B)
                       • alternative 4-stage RFT pipeline
                       • cross-validation against MAI-UI
                       • ~1.2 GB target / 3.45 GB load peak VRAM

The Venus variant exists as the cross-validation track — when MAI-UI gives ambiguous click coordinates or fails to ground an element, Venus provides a second opinion from an entirely separate training pipeline (RFT-based, not RL-from-environment).

Property Value
Immediate parent UI-Venus-1.5-2B-NOESIS-BF16 (AMAImedia BF16 repack of inclusionAI/UI-Venus-1.5-2B)
Upstream lineage Qwen/Qwen3-VL-2B (Apache 2.0) → inclusionAI/UI-Venus-1.5-2B (Apache 2.0) → AMAImedia BF16 repack → AMAImedia NF4
Architecture Qwen3VLForConditionalGeneration (multimodal, vision tower retained)
Text hidden 2 048 / 28 layers / 16 heads (GQA 2 : 1, 8 kv heads)
Vision tower depth 24, hidden 1024, patch 16, deepstack at layers [5,11,17]
Vocab size 151 936
Context 262 144 (mRoPE [24,20,20] interleaved, rope_theta 5M)
Format NF4 (bnb 4-bit, double-quant, bf16 compute)
Bundle size on disk 2.19 GB (single safetensors)
VRAM target (inference) 1.2 GB ✅ RTX 3060 6 GB
VRAM peak (load) 3.45 GB
License Apache 2.0 (commercial-ok)
Project page https://ui-venus.github.io/UI-Venus-1.5
Papers arxiv:2602.09082 (Venus 1.5), arxiv:2508.10833 (Venus RFT)

Upstream Venus Team documentation (preserved)

UI-Venus-1.5-2B — End-to-end GUI Agent

UI-Venus-1.5 is a unified end-to-end GUI Agent designed for robust real-world applications. The model family includes two dense variants (2B / 8B) and one MoE variant (30B-A3B). This folder hosts the 2B dense variant, NF4-quantized.

Training pipeline (4 stages)

Stage 1 — Mid-Training
  10B tokens across 30+ GUI datasets
  Foundational GUI semantics

Stage 2 — Offline-RL
  Task-specific optimization:
    • grounding (ScreenSpot / OSWorld-G / VenusBench-GD)
    • mobile  (AndroidWorld / AndroidLab / VenusBench-Mobile)
    • web     (WebVoyager / OSWorld-W)

Stage 3 — Online-RL
  Full-trajectory rollouts for long-horizon dynamic navigation
  RFT (Reinforcement Fine-Tuning) per arXiv 2508.10833

Stage 4 — Model Merge
  Unifying specialists into single deployable checkpoint

Benchmarks (per upstream Venus Team report)

Benchmark 30B-A3B variant This 2B variant (proportional)
ScreenSpot-Pro 69.6% 57.7%
VenusBench-GD 75.0% (scales with size)
OSWorld-G-R 76.4% (scales with size)
OSWorld-G 70.6% (scales with size)
UI-Vision 54.7% (scales with size)
AndroidWorld 77.6% (scales with size)
AndroidLab 55.1% / 68.1% (scales with size)
VenusBench-Mobile 21.5% (scales with size)
WebVoyager 76.0% (scales with size)

Numbers per upstream Venus Team report; 2B variant proportionally lower per "Consistent Scaling Gains" note in upstream README.


Quantization details (NOESIS-side)

Parameter Value
Library bitsandbytes 0.49.2
Method NF4 (Normalized Float 4-bit)
bnb_4bit_use_double_quant True (saves ~5% via nested quant)
bnb_4bit_compute_dtype bfloat16
Device map {"": 0} (R-NF4-DEVICE-MAP-EXPLICIT)
Source dir D:\models\vlm-gui-mot\UI-Venus-1.5-2B-NOESIS-BF16
Output disk size 2.19 GB (single safetensors)
VRAM target (inference) 1.2 GB
VRAM peak (load) 3.45 GB
Quant date 2026-05-21 08:32:59

Higher load peak (3.45 GB vs MAI-UI 2B 1.6 GB) reflects Venus's internal layer-precision retention strategy — the working set settles to 1.2 GB after warmup, but initialization touches more parameters in higher precision.

Quick start

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

bundle = "B:/Downloads/Portable/NOESIS-VC-ONE/models/llm/NOESIS-Qwen3-VL-2B-UI-Venus-NF4"

processor = AutoProcessor.from_pretrained(bundle)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    bundle,
    device_map={"": 0},          # NEVER "auto" with NF4
    torch_dtype=torch.bfloat16,
).eval()

# Browser DOM screenshot grounding example
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "screenshot.png"},
            {"type": "text",  "text": "Click the 'Subscribe' button."},
        ],
    },
]
inputs = processor.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True,
    return_tensors="pt",
).to(0)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(processor.decode(outputs[0], skip_special_tokens=True))
# → predicted bounding box / click coordinates for the Subscribe button

NOESIS ui-agent.amaimedia.com wiring

Endpoint tier Backend VRAM Role
PRIMARY NOESIS-Qwen3-VL-8B-MAI-UI-NF4 ~5 GB Canonical agent — full SOTA quality
SECONDARY NOESIS-Qwen3-VL-2B-MAI-UI-NF4 ~1.6 GB Low-VRAM fallback for low-spec clients
FALLBACK THIS bundle ~1.2 GB target / 3.45 GB load peak Cross-validation via alternative 4-stage RFT pipeline

When to invoke FALLBACK (Venus) over PRIMARY/SECONDARY (MAI-UI):

  • MAI-UI returns ambiguous coordinates (low confidence)
  • Mobile-specific tasks (Venus has dedicated AndroidWorld/AndroidLab specialists)
  • A/B-comparison runs for grounding-quality regression tests
  • Independent second-opinion before committing destructive UI action

Sealed rules (NOESIS DHCF-FNO)

  • R-APACHE-CLEAN — Apache 2.0 preserved end-to-end (Qwen Team → Inclusion AI Venus Team → AMAImedia BF16 repack → AMAImedia NF4 quant).
  • R-NF4-DEVICE-MAP-EXPLICIT — must load with device_map={"": 0}; never device_map="auto" with NF4.
  • R-AGENT-PRIMARY-MAI-UI-8B-NF4 — MAI-UI 8B NF4 is PRIMARY, MAI-UI 2B NF4 is SECONDARY, this Venus 2B is FALLBACK (alternative pipeline cross-validation).
  • R-UI-VENUS-FALLBACK-TO-MAI-UI — Venus is fallback / cross-validation track; MAI-UI is canonical agent path.
  • R-VENUS-15-4STAGE-PIPELINE — 4-stage post-training: Mid-Train (10B GUI tok, 30+ datasets) → Offline-RL → Online-RL → Model Merge.
  • R-VENUS-RFT-TRAINED — Reinforcement Fine-Tuning (RFT) per arXiv 2508.10833.
  • R-QWEN3-VL-MROPE-INTERLEAVED — mRoPE [24, 20, 20] interleaved with rope_theta 5M (text); 256K context capable.
  • R-UI-AGENT-PRODUCT-SCOPE — mounted on ui-agent.amaimedia.com (browser DOM automation), NOT the dubbing pipeline core path.
  • R-VENDORED-INTERNAL — plain LICENSE preserved (BF16-tier NOTICE blocks) alongside LICENSE.md (NF4-tier NOTICE).
  • R-THIRD-PARTY-WRAPPERS-ONLY — Phase 1 SCOPE LOCK — third-party + wrappers only, no own training.
  • R-VISION-TOWER-RETAINED — full Qwen3-VL ViT preserved (depth 24, deepstack at [5,11,17]) — required for screenshot grounding.
  • R-QWEN-VOCAB-151936 — compatible within Qwen3 family.
  • R-UI-AGENT-OUT-OF-SCOPE — NOT in NOESIS dubbing-pipeline core path. Reserved for Phase 2 desktop agent / auto-clipper UI navigation experiments.

NOESIS provenance

Step Source / output
Base architecture Qwen/Qwen3-VL-2B (© Alibaba Cloud / Qwen Team 2025-2026, Apache 2.0)
GUI agent fine-tune inclusionAI/UI-Venus-1.5-2B (© Inclusion AI / Ant Group Venus Team 2025-2026, Apache 2.0)
Training pipeline 4-stage: Mid-Train (10B GUI tok) → Offline-RL → Online-RL → Merge (RFT per arXiv 2508.10833)
BF16 dtype-repack (intermediate) UI-Venus-1.5-2B-NOESIS-BF16 (© AMAImedia 2026, Apache 2.0, 4.6 GB)
NF4 quantization bitsandbytes 0.49.2 + double-quant + bf16 compute
Local file model.safetensors (2.19 GB) + config.json + processor + tokenizer
Quant date 2026-05-21 08:32:59
NOESIS version v15.8
Renamed from NOESIS-UI-Venus-2B-NF4 (per R-NOESIS-FOLDER-NAMING-PREFIX-QWEN3-VL)
Production endpoint ui-agent.amaimedia.com (Phase 2 subdomain)

Reference docs:

  • NOESIS CLAUDE.md GOLDEN RULE 2 (NF4 device_map={"":0})
  • NOESIS sealed rule R-AGENT-PRIMARY-MAI-UI-8B-NF4
  • NOESIS_NF4_MANIFEST.json in this folder
  • arXiv 2602.09082 (Venus 1.5 Technical Report)
  • arXiv 2508.10833 (Venus RFT Technical Report)

Citation

@misc{venusteam2026uivenus15technicalreport,
      title={UI-Venus-1.5 Technical Report},
      author={Venus Team and Changlong Gao and Zhangxuan Gu and Yulin Liu
              and Xinyu Qiu and Shuheng Shen and Yue Wen and Tianyu Xia
              and Zhenyu Xu and Zhengwen Zeng and Beitong Zhou and
              Xingran Zhou and Weizhi Chen and Sunhao Dai and Jingya Dou
              and Yichen Gong and Yuan Guo and Zhenlin Guo and Feng Li
              and Qian Li and Jinzhen Lin and Yuqi Zhou and Linchao Zhu
              and Liang Chen and Zhenyu Guo and Changhua Meng and
              Weiqiang Wang},
      year={2026},
      eprint={2602.09082},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.09082}
}

@misc{gu2025uivenustechnicalreportbuilding,
      title={UI-Venus Technical Report: Building High-performance UI Agents
             with RFT},
      author={Zhangxuan Gu and Zhengwen Zeng and Zhenyu Xu and Xingran Zhou
              and Shuheng Shen and Yunfei Liu and Beitong Zhou and Changhua
              Meng and Tianyu Xia and Weizhi Chen and Yue Wen and Jingya Dou
              and Fei Tang and Jinzhen Lin and Yulin Liu and Zhenlin Guo
              and Yichen Gong and Heng Jia and Changlong Gao and Yuan Guo
              and Yong Deng and Zhenyu Guo and Liang Chen and Weiqiang Wang},
      year={2025},
      eprint={2508.10833},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.10833}
}

@misc{noesis2026qwen3vl2buivenusnf4,
  title  = {NOESIS DHCF-FNO :: Qwen3-VL-2B-UI-Venus NF4 — FALLBACK
            agent on ui-agent.amaimedia.com},
  author = {Bolotnikov, Ilia and AMAImedia},
  year   = {2026},
  note   = {NF4 (bitsandbytes) quantization derivative of
            inclusionAI/UI-Venus-1.5-2B (via AMAImedia BF16 repack),
            Apache 2.0. 2.19 GB on disk, 1.2 GB VRAM target on RTX 3060.},
  url    = {https://amaimedia.com}
}

License

Apache License 2.0. Qwen3-VL base architecture © Alibaba Cloud / Qwen Team. UI-Venus-1.5-2B 4-stage RFT fine-tune © Inclusion AI / Ant Group (Venus Team). BF16 dtype-repack + NF4 quantization + NOESIS bundling + sealed-rule wiring: © AMAImedia (NOESIS DHCF-FNO project) 2026.

Commercial use is permitted subject to the standard Apache 2.0 preservation requirements (copyright + LICENSE + NOTICE-equivalent attribution must travel with redistributions). See LICENSE (plain text, BF16-tier NOTICE blocks) and LICENSE.md (Markdown, NF4-tier NOTICE) in this folder for the full attribution chain.


Author


Produced 2026-05-21 by NOESIS DHCF-FNO v15.8 — AMAImedia.com

Downloads last month
20
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4

Quantized
(6)
this model

Papers for AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4