You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3.5-4B GUI Grounding — v1 (SFT LoRA)

LoRA adapter for Qwen3.5-4B fine-tuned on GUI grounding: given a screenshot and a natural language instruction, predict the (x, y) click coordinate of the target UI element.

Results — ScreenSpot-V2

Split	Accuracy
Overall	92.5%

Training Data

~23.5K samples from 3 GUI grounding datasets covering desktop, web, and mobile platforms.

Output Format

<|box_start|>(x,y)<|box_end|>

Coordinates are in [0, 1000] normalized space. To convert to pixel coordinates:

pixel_x = x / 1000 * image_width
pixel_y = y / 1000 * image_height

Usage

Requires transformers>=5.2.0 and peft.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from peft import PeftModel
import torch

base = Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-4B", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, "dabism23/qwen35-gui-grounding")
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-4B")

Access

Model weights are gated. Request access to download. Training configuration details are included with the model files.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mdabis/qwen35-gui-grounding

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(221)

this model

mdabis
/

qwen35-gui-grounding