Instructions to use AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4")
model = AutoModelForImageTextToText.from_pretrained("AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4

SGLang

How to use AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4 with Docker Model Runner:
```
docker model run hf.co/AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4
```

NOESIS-Qwen3-VL-2B-UI-Venus-NF4 (NOESIS DHCF-FNO bundle)

NF4 quantization derivative of inclusionAI/UI-Venus-1.5-2B — end-to-end GUI agent trained via 4-stage post-training pipeline (Mid-Train → Offline-RL → Online-RL → Model-Merge) on top of Qwen3-VL-2B. NF4-quantized via bitsandbytes 0.49.2 (double_quant + bf16 compute) from the intermediate UI-Venus-1.5-2B-NOESIS-BF16 AMAImedia BF16 repack sibling.

Used inside the NOESIS DHCF-FNO stack as the FALLBACK 2B agent on the public ui-agent.amaimedia.com subdomain (browser DOM automation), providing alternative-pipeline cross-validation against the PRIMARY MAI-UI 8B NF4 path per R-AGENT-PRIMARY-MAI-UI-8B-NF4.

✅ APACHE 2.0 — COMMERCIAL USE PERMITTED. End-to-end clean lineage (Alibaba Cloud / Qwen Team Apache 2.0 → Inclusion AI / Ant Group Venus Team Apache 2.0 → AMAImedia BF16 repack Apache 2.0 → AMAImedia NF4 Apache 2.0). Standard transformers.from_pretrained loading with device_map={"": 0} (NF4 requirement per CLAUDE.md GOLDEN RULE 2).

Released as part of the NOESIS Professional Multilingual Dubbing Automation Platform (framework: DHCF-FNO — Deterministic Hybrid Control Framework for Frozen Neural Operators).

Founder: Ilia Bolotnikov
Organization: AMAImedia.com
X (Twitter): @AMAImediacom
LinkedIn: Ilia Bolotnikov
Telegram: @AMAImediacom
NOESIS version: v15.8
Quantization date: 2026-05-21 08:32:59
Renamed from: NOESIS-UI-Venus-2B-NF4 (per R-NOESIS-FOLDER-NAMING-PREFIX-QWEN3-VL)

NOESIS role — fallback 2B agent on ui-agent.amaimedia.com

Browser DOM automation agent mounted on ui-agent.amaimedia.com (Phase 2 desktop agent / auto-clipper UI nav subdomain) as the FALLBACK tier in a 3-tier hierarchy. Provides cross-validation against MAI-UI's grounding output through a fundamentally different training pipeline (4-stage RFT vs self-evolving data + device-cloud collab).

ui-agent.amaimedia.com (browser DOM automation)
        │
        ├── PRIMARY  : NOESIS-Qwen3-VL-8B-MAI-UI-NF4    (Tongyi MAI-UI 8B, ~5 GB VRAM)
        │              R-AGENT-PRIMARY-MAI-UI-8B-NF4
        │
        ├── SECONDARY: NOESIS-Qwen3-VL-2B-MAI-UI-NF4    (Tongyi MAI-UI 2B, ~1.6 GB VRAM)
        │              low-VRAM fallback for primary
        │
        └── FALLBACK : NOESIS-Qwen3-VL-2B-UI-Venus-NF4  (this, Inclusion AI Venus 1.5 2B)
                       • alternative 4-stage RFT pipeline
                       • cross-validation against MAI-UI
                       • ~1.2 GB target / 3.45 GB load peak VRAM

The Venus variant exists as the cross-validation track — when MAI-UI gives ambiguous click coordinates or fails to ground an element, Venus provides a second opinion from an entirely separate training pipeline (RFT-based, not RL-from-environment).

Property	Value
Immediate parent	`UI-Venus-1.5-2B-NOESIS-BF16` (AMAImedia BF16 repack of `inclusionAI/UI-Venus-1.5-2B`)
Upstream lineage	`Qwen/Qwen3-VL-2B` (Apache 2.0) → `inclusionAI/UI-Venus-1.5-2B` (Apache 2.0) → AMAImedia BF16 repack → AMAImedia NF4
Architecture	`Qwen3VLForConditionalGeneration` (multimodal, vision tower retained)
Text hidden	2 048 / 28 layers / 16 heads (GQA 2 : 1, 8 kv heads)
Vision tower	depth 24, hidden 1024, patch 16, deepstack at layers [5,11,17]
Vocab size	151 936
Context	262 144 (mRoPE [24,20,20] interleaved, rope_theta 5M)
Format	NF4 (bnb 4-bit, double-quant, bf16 compute)
Bundle size on disk	2.19 GB (single safetensors)
VRAM target (inference)	1.2 GB ✅ RTX 3060 6 GB
VRAM peak (load)	3.45 GB
License	Apache 2.0 (commercial-ok)
Project page	https://ui-venus.github.io/UI-Venus-1.5
Papers	arxiv:2602.09082 (Venus 1.5), arxiv:2508.10833 (Venus RFT)

Upstream Venus Team documentation (preserved)

UI-Venus-1.5-2B — End-to-end GUI Agent

UI-Venus-1.5 is a unified end-to-end GUI Agent designed for robust real-world applications. The model family includes two dense variants (2B / 8B) and one MoE variant (30B-A3B). This folder hosts the 2B dense variant, NF4-quantized.

Training pipeline (4 stages)

Stage 1 — Mid-Training
  10B tokens across 30+ GUI datasets
  Foundational GUI semantics

Stage 2 — Offline-RL
  Task-specific optimization:
    • grounding (ScreenSpot / OSWorld-G / VenusBench-GD)
    • mobile  (AndroidWorld / AndroidLab / VenusBench-Mobile)
    • web     (WebVoyager / OSWorld-W)

Stage 3 — Online-RL
  Full-trajectory rollouts for long-horizon dynamic navigation
  RFT (Reinforcement Fine-Tuning) per arXiv 2508.10833

Stage 4 — Model Merge
  Unifying specialists into single deployable checkpoint

Benchmarks (per upstream Venus Team report)

Benchmark	30B-A3B variant	This 2B variant (proportional)
ScreenSpot-Pro	69.6%	57.7%
VenusBench-GD	75.0%	(scales with size)
OSWorld-G-R	76.4%	(scales with size)
OSWorld-G	70.6%	(scales with size)
UI-Vision	54.7%	(scales with size)
AndroidWorld	77.6%	(scales with size)
AndroidLab	55.1% / 68.1%	(scales with size)
VenusBench-Mobile	21.5%	(scales with size)
WebVoyager	76.0%	(scales with size)

Numbers per upstream Venus Team report; 2B variant proportionally lower per "Consistent Scaling Gains" note in upstream README.

Quantization details (NOESIS-side)

Parameter	Value
Library	`bitsandbytes` 0.49.2
Method	NF4 (Normalized Float 4-bit)
`bnb_4bit_use_double_quant`	True (saves ~5% via nested quant)
`bnb_4bit_compute_dtype`	bfloat16
Device map	`{"": 0}` (R-NF4-DEVICE-MAP-EXPLICIT)
Source dir	`D:\models\vlm-gui-mot\UI-Venus-1.5-2B-NOESIS-BF16`
Output disk size	2.19 GB (single safetensors)
VRAM target (inference)	1.2 GB
VRAM peak (load)	3.45 GB
Quant date	2026-05-21 08:32:59

Higher load peak (3.45 GB vs MAI-UI 2B 1.6 GB) reflects Venus's internal layer-precision retention strategy — the working set settles to 1.2 GB after warmup, but initialization touches more parameters in higher precision.

Quick start

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

bundle = "B:/Downloads/Portable/NOESIS-VC-ONE/models/llm/NOESIS-Qwen3-VL-2B-UI-Venus-NF4"

processor = AutoProcessor.from_pretrained(bundle)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    bundle,
    device_map={"": 0},          # NEVER "auto" with NF4
    torch_dtype=torch.bfloat16,
).eval()

# Browser DOM screenshot grounding example
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "screenshot.png"},
            {"type": "text",  "text": "Click the 'Subscribe' button."},
        ],
    },
]
inputs = processor.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True,
    return_tensors="pt",
).to(0)
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(processor.decode(outputs[0], skip_special_tokens=True))
# → predicted bounding box / click coordinates for the Subscribe button

NOESIS ui-agent.amaimedia.com wiring

Endpoint tier	Backend	VRAM	Role
PRIMARY	`NOESIS-Qwen3-VL-8B-MAI-UI-NF4`	~5 GB	Canonical agent — full SOTA quality
SECONDARY	`NOESIS-Qwen3-VL-2B-MAI-UI-NF4`	~1.6 GB	Low-VRAM fallback for low-spec clients
FALLBACK	THIS bundle	~1.2 GB target / 3.45 GB load peak	Cross-validation via alternative 4-stage RFT pipeline

When to invoke FALLBACK (Venus) over PRIMARY/SECONDARY (MAI-UI):

MAI-UI returns ambiguous coordinates (low confidence)
Mobile-specific tasks (Venus has dedicated AndroidWorld/AndroidLab specialists)
A/B-comparison runs for grounding-quality regression tests
Independent second-opinion before committing destructive UI action

Sealed rules (NOESIS DHCF-FNO)

R-APACHE-CLEAN — Apache 2.0 preserved end-to-end (Qwen Team → Inclusion AI Venus Team → AMAImedia BF16 repack → AMAImedia NF4 quant).
R-NF4-DEVICE-MAP-EXPLICIT — must load with device_map={"": 0}; never device_map="auto" with NF4.
R-AGENT-PRIMARY-MAI-UI-8B-NF4 — MAI-UI 8B NF4 is PRIMARY, MAI-UI 2B NF4 is SECONDARY, this Venus 2B is FALLBACK (alternative pipeline cross-validation).
R-UI-VENUS-FALLBACK-TO-MAI-UI — Venus is fallback / cross-validation track; MAI-UI is canonical agent path.
R-VENUS-15-4STAGE-PIPELINE — 4-stage post-training: Mid-Train (10B GUI tok, 30+ datasets) → Offline-RL → Online-RL → Model Merge.
R-VENUS-RFT-TRAINED — Reinforcement Fine-Tuning (RFT) per arXiv 2508.10833.
R-QWEN3-VL-MROPE-INTERLEAVED — mRoPE [24, 20, 20] interleaved with rope_theta 5M (text); 256K context capable.
R-UI-AGENT-PRODUCT-SCOPE — mounted on ui-agent.amaimedia.com (browser DOM automation), NOT the dubbing pipeline core path.
R-VENDORED-INTERNAL — plain LICENSE preserved (BF16-tier NOTICE blocks) alongside LICENSE.md (NF4-tier NOTICE).
R-THIRD-PARTY-WRAPPERS-ONLY — Phase 1 SCOPE LOCK — third-party + wrappers only, no own training.
R-VISION-TOWER-RETAINED — full Qwen3-VL ViT preserved (depth 24, deepstack at [5,11,17]) — required for screenshot grounding.
R-QWEN-VOCAB-151936 — compatible within Qwen3 family.
R-UI-AGENT-OUT-OF-SCOPE — NOT in NOESIS dubbing-pipeline core path. Reserved for Phase 2 desktop agent / auto-clipper UI navigation experiments.

NOESIS provenance

Step	Source / output
Base architecture	`Qwen/Qwen3-VL-2B` (© Alibaba Cloud / Qwen Team 2025-2026, Apache 2.0)
GUI agent fine-tune	`inclusionAI/UI-Venus-1.5-2B` (© Inclusion AI / Ant Group Venus Team 2025-2026, Apache 2.0)
Training pipeline	4-stage: Mid-Train (10B GUI tok) → Offline-RL → Online-RL → Merge (RFT per arXiv 2508.10833)
BF16 dtype-repack (intermediate)	`UI-Venus-1.5-2B-NOESIS-BF16` (© AMAImedia 2026, Apache 2.0, 4.6 GB)
NF4 quantization	`bitsandbytes` 0.49.2 + double-quant + bf16 compute
Local file	`model.safetensors` (2.19 GB) + `config.json` + processor + tokenizer
Quant date	2026-05-21 08:32:59
NOESIS version	v15.8
Renamed from	`NOESIS-UI-Venus-2B-NF4` (per R-NOESIS-FOLDER-NAMING-PREFIX-QWEN3-VL)
Production endpoint	`ui-agent.amaimedia.com` (Phase 2 subdomain)

Reference docs:

NOESIS CLAUDE.md GOLDEN RULE 2 (NF4 device_map={"":0})
NOESIS sealed rule R-AGENT-PRIMARY-MAI-UI-8B-NF4
NOESIS_NF4_MANIFEST.json in this folder
arXiv 2602.09082 (Venus 1.5 Technical Report)
arXiv 2508.10833 (Venus RFT Technical Report)

Citation

@misc{venusteam2026uivenus15technicalreport,
      title={UI-Venus-1.5 Technical Report},
      author={Venus Team and Changlong Gao and Zhangxuan Gu and Yulin Liu
              and Xinyu Qiu and Shuheng Shen and Yue Wen and Tianyu Xia
              and Zhenyu Xu and Zhengwen Zeng and Beitong Zhou and
              Xingran Zhou and Weizhi Chen and Sunhao Dai and Jingya Dou
              and Yichen Gong and Yuan Guo and Zhenlin Guo and Feng Li
              and Qian Li and Jinzhen Lin and Yuqi Zhou and Linchao Zhu
              and Liang Chen and Zhenyu Guo and Changhua Meng and
              Weiqiang Wang},
      year={2026},
      eprint={2602.09082},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.09082}
}

@misc{gu2025uivenustechnicalreportbuilding,
      title={UI-Venus Technical Report: Building High-performance UI Agents
             with RFT},
      author={Zhangxuan Gu and Zhengwen Zeng and Zhenyu Xu and Xingran Zhou
              and Shuheng Shen and Yunfei Liu and Beitong Zhou and Changhua
              Meng and Tianyu Xia and Weizhi Chen and Yue Wen and Jingya Dou
              and Fei Tang and Jinzhen Lin and Yulin Liu and Zhenlin Guo
              and Yichen Gong and Heng Jia and Changlong Gao and Yuan Guo
              and Yong Deng and Zhenyu Guo and Liang Chen and Weiqiang Wang},
      year={2025},
      eprint={2508.10833},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.10833}
}

@misc{noesis2026qwen3vl2buivenusnf4,
  title  = {NOESIS DHCF-FNO :: Qwen3-VL-2B-UI-Venus NF4 — FALLBACK
            agent on ui-agent.amaimedia.com},
  author = {Bolotnikov, Ilia and AMAImedia},
  year   = {2026},
  note   = {NF4 (bitsandbytes) quantization derivative of
            inclusionAI/UI-Venus-1.5-2B (via AMAImedia BF16 repack),
            Apache 2.0. 2.19 GB on disk, 1.2 GB VRAM target on RTX 3060.},
  url    = {https://amaimedia.com}
}

License

Apache License 2.0. Qwen3-VL base architecture © Alibaba Cloud / Qwen Team. UI-Venus-1.5-2B 4-stage RFT fine-tune © Inclusion AI / Ant Group (Venus Team). BF16 dtype-repack + NF4 quantization + NOESIS bundling + sealed-rule wiring: © AMAImedia (NOESIS DHCF-FNO project) 2026.

Commercial use is permitted subject to the standard Apache 2.0 preservation requirements (copyright + LICENSE + NOTICE-equivalent attribution must travel with redistributions). See LICENSE (plain text, BF16-tier NOTICE blocks) and LICENSE.md (Markdown, NF4-tier NOTICE) in this folder for the full attribution chain.

Author

Founder: Ilia Bolotnikov
Organization: AMAImedia.com
X (Twitter): @AMAImediacom
LinkedIn: Ilia Bolotnikov
Telegram: @AMAImediacom
NOESIS version: v15.8
Quantization date: 2026-05-21 08:32:59
Parent BF16 source: UI-Venus-1.5-2B-NOESIS-BF16 (D:\models\vlm-gui-mot)
Upstream: inclusionAI/UI-Venus-1.5-2B
Vendored component: NOESIS-Qwen3-VL-2B-UI-Venus-NF4 (Apache 2.0)

Produced 2026-05-21 by NOESIS DHCF-FNO v15.8 — AMAImedia.com

Downloads last month: 20

Safetensors

Model size

2B params

Tensor type

F32

BF16

Model tree for AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4

Base model

inclusionAI/UI-Venus-1.5-2B

Quantized

(6)

this model

Papers for AMAImedia/Qwen3-VL-2B-UI-Venus-NOESIS-NF4

UI-Venus-1.5 Technical Report

Paper • 2602.09082 • Published Feb 9 • 157

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14, 2025 • 46