Instructions to use Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit") config = load_config("Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit
Run Hermes
hermes
Mano-CUA-4B-Thinking-1.1-MLX-8bit
Mano-CUA is the Computer Use Agent model under the Mano open-source model series. It is a GUI-VLA (Visual Language Agent) model designed specifically for edge devices, capable of autonomously completing complex desktop GUI operations through visual understanding.
This is the MLX 8-bit quantized version, optimized for Apple Silicon (Mac mini / MacBook). For the full-precision fp16 version, see Mano-CUA-4B-Thinking-1.1.
Main Capabilities
- Complex GUI Automation: Autonomously complete complex interface operations containing hundreds of interactive elements
- Cross-System Data Integration: Extract and integrate multi-source data through pure visual interaction without API interfaces
- Long-Task Planning Execution: Support enterprise-level business process automation of dozens to hundreds of steps
- Intelligent Report Generation: Automatically generate structured documents such as data analysis reports and work summaries
Technical Background
Mano-CUA builds upon the complete technical framework of the Mano project (see Mano Technical Report), employing the Mano-Action bidirectional self-reinforcement learning method, three-stage progressive training (SFT → Offline Reinforcement Learning → Online Reinforcement Learning), "think-act-verify" loop reasoning mechanism, and a closed-loop data circulation system to achieve high-precision GUI understanding and operation capabilities. The edge version is optimized through mixed-precision quantization, visual token pruning, and edge inference adaptation, enabling large-scale parameter models to run efficiently on edge devices like Mac mini/MacBook/computing sticks.
Quick Start
Requirements
- macOS with Apple Silicon (M1+)
- Python >= 3.12
Installation
With Cider (recommended, includes W8A8 acceleration on M5+):
pip install mlx-vlm
pip install git+https://github.com/Mininglamp-AI/cider.git
Without Cider:
pip install mlx-vlm
Single-Step Demo
import mlx_vlm as pm
from vlm_service import custom_generate
from PIL import Image
# 1. Load model
model, processor = pm.load("Mininglamp-2718/Mano-CUA-4B-Thinking-1.1-MLX-8bit")
# 2. Load a screenshot
img = Image.open("screenshot.png")
ratio = 1280 / img.width
img = img.resize((1280, int(img.height * ratio)), Image.LANCZOS)
# 3. Build prompt
task = "Click the search bar and type hello"
prompt_text = f"""You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
## Output Format
<action>action</action>
## Action Space
open_app(app_name='') # Open an application by name.
open_url(url='') # Open a URL in the browser.
click(start_box='<|box_start|>(x1,y1)<|box_end|>')
type(content='') # type the content.
hotkey(key='') # Trigger a keyboard shortcut.
scroll(start_box='<|box_start|>(x1,y1)<|box_end|>', direction='down or up or right or left', amount='scroll_amount')
drag(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x3,y3)<|box_end|>')
wait(duration='') # Sleep for specified duration (in seconds).
finish() # The task is completed.
stop(reason='') # If the item can not found in the image, give the reason
## User Instruction
{task}"""
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt_text},
]
prompt = processor.tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
prompt = prompt.replace("<image>", "<|vision_start|><|image_pad|><|vision_end|>")
# 4. Run inference
result = custom_generate(
model, processor, prompt,
[img],
max_tokens=512,
temperature=0.0,
prefill_step_size=2048,
)
print(f"Tokens: {result.generation_tokens}, Speed: {result.generation_tps:.1f} tok/s")
print(result.text)
Output Format
The model outputs structured XML:
<think>The search bar is at the top of the page...</think>
<action_desp>Click the search bar to focus it</action_desp>
<action>click(start_box='<|box_start|>(500,38)<|box_end|>')</action>
Coordinates are normalized to [0, 1000] range. To convert to pixel coordinates:
pixel_x = int(x / 1000 * screen_width)
pixel_y = int(y / 1000 * screen_height)
W8A8 Acceleration (M5+ only)
On Apple M5 or later, enable INT8 acceleration for ~15-19% faster prefill:
from cider import convert_model, is_available
if is_available():
convert_model(model.language_model)
Full Action Space
| Action | Syntax | Description |
|---|---|---|
| open_app | open_app(app_name='') |
Open an application |
| open_url | open_url(url='') |
Open a URL |
| click | click(start_box='<|box_start|>(x,y)<|box_end|>') |
Left click |
| doubleclick | doubleclick(start_box='<|box_start|>(x,y)<|box_end|>') |
Double click |
| triple_click | triple_click(start_box='<|box_start|>(x,y)<|box_end|>') |
Triple click (select line) |
| right_single | right_single(start_box='<|box_start|>(x,y)<|box_end|>') |
Right click |
| hover | hover(start_box='<|box_start|>(x,y)<|box_end|>') |
Mouse hover |
| type | type(content='text') |
Type text |
| hotkey | hotkey(key='cmd+c') |
Keyboard shortcut |
| hotkey_click | hotkey_click(start_box='<|box_start|>(x,y)<|box_end|>', key='shift') |
Modifier + click |
| scroll | scroll(start_box='<|box_start|>(x,y)<|box_end|>', direction='down', amount='3') |
Scroll |
| drag | drag(start_box='<|box_start|>(x1,y1)<|box_end|>', end_box='<|box_start|>(x2,y2)<|box_end|>') |
Drag and drop |
| wait | wait(duration='2') |
Wait (seconds) |
| finish | finish() |
Task completed |
| stop | stop(reason='...') |
Task infeasible |
| call_user | call_user() |
Request human help |
Other Versions
| Version | Repo | Description |
|---|---|---|
| fp16 | Mano-CUA-4B-Thinking-1.1 | Full precision, for archival / re-quantization / GPU inference |
| MLX-8bit (this) | Mano-CUA-4B-Thinking-1.1-MLX-8bit | MLX 8-bit quantized, recommended for Apple Silicon local inference |
Contact
- Website: https://github.com/Mininglamp-AI/Mano-P
- Email: model@mininglamp.com
- Downloads last month
- 249
8-bit