Instructions to use InternScience/Agents-A1-Q4_K_M-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use InternScience/Agents-A1-Q4_K_M-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="InternScience/Agents-A1-Q4_K_M-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("InternScience/Agents-A1-Q4_K_M-GGUF", dtype="auto") - llama-cpp-python
How to use InternScience/Agents-A1-Q4_K_M-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="InternScience/Agents-A1-Q4_K_M-GGUF", filename="Agents-A1-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use InternScience/Agents-A1-Q4_K_M-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Use Docker
docker model run hf.co/InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use InternScience/Agents-A1-Q4_K_M-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "InternScience/Agents-A1-Q4_K_M-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InternScience/Agents-A1-Q4_K_M-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
- SGLang
How to use InternScience/Agents-A1-Q4_K_M-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "InternScience/Agents-A1-Q4_K_M-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InternScience/Agents-A1-Q4_K_M-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "InternScience/Agents-A1-Q4_K_M-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InternScience/Agents-A1-Q4_K_M-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use InternScience/Agents-A1-Q4_K_M-GGUF with Ollama:
ollama run hf.co/InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
- Unsloth Studio
How to use InternScience/Agents-A1-Q4_K_M-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for InternScience/Agents-A1-Q4_K_M-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for InternScience/Agents-A1-Q4_K_M-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for InternScience/Agents-A1-Q4_K_M-GGUF to start chatting
- Pi
How to use InternScience/Agents-A1-Q4_K_M-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use InternScience/Agents-A1-Q4_K_M-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use InternScience/Agents-A1-Q4_K_M-GGUF with Docker Model Runner:
docker model run hf.co/InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
- Lemonade
How to use InternScience/Agents-A1-Q4_K_M-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull InternScience/Agents-A1-Q4_K_M-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Agents-A1-Q4_K_M-GGUF-Q4_K_M
List all available models
lemonade list
Initial upload
Browse files- .gitattributes +2 -0
- Agents-A1-Q4_K_M.gguf +3 -0
- README.md +444 -0
- figures/24px.svg +1 -0
- figures/a1_benchmarks_altair_grid.svg +0 -0
- figures/github-logo.svg +3 -0
- figures/hf-logo.svg +8 -0
- figures/logo_nobg.png +3 -0
- figures/modelscope-logo.svg +12 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
Agents-A1-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
figures/logo_nobg.png filter=lfs diff=lfs merge=lfs -text
|
Agents-A1-Q4_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:31aefa25b7e1edbde436e643e2b5e3f6e57820a4811d97b131130e48ff0772c2
|
| 3 |
+
size 21166757632
|
README.md
ADDED
|
@@ -0,0 +1,444 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Agents-A1: Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent
|
| 8 |
+
|
| 9 |
+
<div style="display: flex; flex-direction: column; align-items: center; line-height: 1.2;">
|
| 10 |
+
<div style="display: flex; justify-content: center; align-items: center; gap: 10px; height: 30px;">
|
| 11 |
+
<span style="font-size: 16px;" role="img" aria-label="Homepage">🏠</span>
|
| 12 |
+
<a href="https://internscience.github.io/Agents-A1/"><b>Homepage</b></a>
|
| 13 |
+
<span style="color: #ccc;">|</span>
|
| 14 |
+
<img src="./figures/24px.svg" width="16" height="16" alt="Technical Report" style="filter: invert(0.5);">
|
| 15 |
+
<a href="https://arxiv.org/abs/2606.30616"><b>Technical Report</b></a>
|
| 16 |
+
</div>
|
| 17 |
+
|
| 18 |
+
<div style="display: flex; justify-content: center; align-items: center; gap: 10px; height: 30px; margin-top: 2px;">
|
| 19 |
+
<img src="./figures/hf-logo.svg" width="16" height="16" alt="Hugging Face">
|
| 20 |
+
<a href="https://huggingface.co/InternScience/Agents-A1"><b>Hugging Face</b></a>
|
| 21 |
+
<span style="color: #ccc;">|</span>
|
| 22 |
+
<img src="./figures/github-logo.svg" width="16" height="16" alt="GitHub">
|
| 23 |
+
<a href="https://github.com/InternScience/Agents-A1"><b>Github</b></a>
|
| 24 |
+
<span style="color: #ccc;">|</span>
|
| 25 |
+
<img src="./figures/modelscope-logo.svg" width="16" height="16" alt="Model Scope">
|
| 26 |
+
<a href="https://modelscope.cn/models/InternScience/Agents-A1"><b>ModelScope</b></a>
|
| 27 |
+
</div>
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
> [!Note]
|
| 31 |
+
> This repository contains model weights and configuration files for Agents-A1 in the Hugging Face Transformers format.
|
| 32 |
+
>
|
| 33 |
+
> These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, etc.
|
| 34 |
+
|
| 35 |
+
**Agents‑A1** is a 35B Mixture‑of‑Experts agentic model from [InternScience](https://huggingface.co/InternScience), built to scale heterogeneous agentic abilities across multiple domains including **Long‑horizon Search, Engineering, Scientific Research, Instruction Following, and Tool-calling**. We investigate agent-horizon scaling from two perspectives: scaling long-horizon trajectories and scaling heterogeneous agent abilities.
|
| 36 |
+
|
| 37 |
+
From the scaling of long-horizon trajectories, **Agents‑A1** is trained with the assistance of a domain-grounded knowledge-action infrastructure that jointly constructs actions, observations, and verifier outcomes, turning the agent's process into a trainable target. From the scaling of heterogeneous agent abilities, **Agents‑A1** presents a three-stage training paradigm for building scalable general-purpose agentic model. First, we perform full-domain supervised fine-tuning to align the base model with broad agentic behaviors. Second, we train domain-level teacher models to capture specialized expertise in each domain. Third, we propose multi-teacher multi-domain on-policy distillation with heterogeneity-aware optimization to improve knowledge transfer efficiency across different domains.
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+

|
| 41 |
+
|
| 42 |
+
## Highlights
|
| 43 |
+
|
| 44 |
+
- **Agentic Reasoning**: Agents-A1 excels at decomposing complex tasks into executable sub-steps, planning ahead, and adapting its strategy based on intermediate results.
|
| 45 |
+
- **Tool Use**: Natively supports function calling and tool integration, enabling seamless interaction with APIs, code interpreters, search engines, and other external tools.
|
| 46 |
+
- **Scientific and Professional Reasoning**: Handles tool-integrated scientific reasoning and professional knowledge question answering.
|
| 47 |
+
- **Instruction Following**: Precisely follows detailed, multi-constraint instructions across diverse domains.
|
| 48 |
+
|
| 49 |
+
We welcome developers and enterprises to integrate and try Agents-A1 and share their feedback.
|
| 50 |
+
|
| 51 |
+
## Performance
|
| 52 |
+
|
| 53 |
+
We evaluate Agents-A1 in real-world agentic and research-oriented workflows across six directions — long-horizon search, engineering tasks, scientific research, instruction following, general agentic tasks, and scientific agentic tasks. Despite operating in the ~35B model class, Agents-A1 delivers highly competitive performance against frontier-scale systems such as GPT-5.5, DeepSeek-V4-pro, and Kimi-K2.6. It achieves overall SOTA results on several challenging benchmarks, including Seal-0 (56.4), HiPhO (46.4), FrontierScience-Olympiad (79.0), FrontierScience-Research (40.00), IFBench (80.6), and IFEval (94.8), while also ranking as the best among comparable models on a broad range of tasks such as BrowseComp (75.5), XBench-DS-2510 (86.0), GAIA (96.0), SciCode (44.3), HLE with tools (47.6), and MolBench-bind (56.8). These results show that Agents-A1 combines strong long-horizon search ability, robust scientific reasoning, and reliable instruction following, establishing it as a highly capable and efficient agentic model that narrows the gap with much larger frontier models.
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
<p>
|
| 57 |
+
🥇 Overall SOTA
|
| 58 |
+
🟢 Best Among Comparable Models (~35B)
|
| 59 |
+
</p>
|
| 60 |
+
|
| 61 |
+
<table>
|
| 62 |
+
<thead>
|
| 63 |
+
<tr>
|
| 64 |
+
<th rowspan="2" align="left">Benchmark</th>
|
| 65 |
+
<th colspan="3" align="center" style="text-align:center;">
|
| 66 |
+
📏 Comparable Models (~35B)
|
| 67 |
+
</th>
|
| 68 |
+
|
| 69 |
+
<th colspan="4" align="center" style="text-align:center;">
|
| 70 |
+
🚀 Larger-scale Models
|
| 71 |
+
</th>
|
| 72 |
+
|
| 73 |
+
<th colspan="2" align="center" style="text-align:center;">
|
| 74 |
+
⭐ Ours
|
| 75 |
+
</th>
|
| 76 |
+
</tr>
|
| 77 |
+
|
| 78 |
+
<tr>
|
| 79 |
+
<th align="center">Qwen3.5-35B-A3B</th>
|
| 80 |
+
<th align="center">Qwen3.6-35B-A3B</th>
|
| 81 |
+
<th align="center">Nex-N2-mini</th>
|
| 82 |
+
|
| 83 |
+
<th align="center">Step-3.5-Flash</th>
|
| 84 |
+
<th align="center">Kimi-K2.6</th>
|
| 85 |
+
<th align="center">DeepSeek-V4-pro(Max)</th>
|
| 86 |
+
<th align="center">GPT-5.5(xhigh)</th>
|
| 87 |
+
|
| 88 |
+
<th align="center">Agents-A1</th>
|
| 89 |
+
</tr>
|
| 90 |
+
</thead>
|
| 91 |
+
|
| 92 |
+
<tbody>
|
| 93 |
+
|
| 94 |
+
<tr>
|
| 95 |
+
<td colspan="9" align="left"><b>🔍 Long-horizon Search</b></td>
|
| 96 |
+
</tr>
|
| 97 |
+
|
| 98 |
+
<tr>
|
| 99 |
+
<td align="left">BrowseComp</td>
|
| 100 |
+
<td align="center">61.0</td>
|
| 101 |
+
<td align="center">67.93</td>
|
| 102 |
+
<td align="center">74.1</td>
|
| 103 |
+
<td align="center">69.0</td>
|
| 104 |
+
<td align="center">83.2</td>
|
| 105 |
+
<td align="center">83.4</td>
|
| 106 |
+
<td align="center">🥇 84.4</td>
|
| 107 |
+
<td align="center">🟢 75.51</td>
|
| 108 |
+
</tr>
|
| 109 |
+
|
| 110 |
+
<tr>
|
| 111 |
+
<td align="left">XBench-DS-2510</td>
|
| 112 |
+
<td align="center">77.0</td>
|
| 113 |
+
<td align="center">71.0</td>
|
| 114 |
+
<td align="center">82.0</td>
|
| 115 |
+
<td align="center">56.3</td>
|
| 116 |
+
<td align="center">🥇 90.0</td>
|
| 117 |
+
<td align="center">🥇 90.0</td>
|
| 118 |
+
<td align="center">84.0</td>
|
| 119 |
+
<td align="center">🟢 86.0</td>
|
| 120 |
+
</tr>
|
| 121 |
+
|
| 122 |
+
<tr>
|
| 123 |
+
<td align="left">Seal0</td>
|
| 124 |
+
<td align="center">41.4</td>
|
| 125 |
+
<td align="center">38.74</td>
|
| 126 |
+
<td align="center">49.55</td>
|
| 127 |
+
<td align="center">36.94</td>
|
| 128 |
+
<td align="center">50.45</td>
|
| 129 |
+
<td align="center">54.95</td>
|
| 130 |
+
<td align="center">42.34</td>
|
| 131 |
+
<td align="center">🥇 56.36</td>
|
| 132 |
+
</tr>
|
| 133 |
+
|
| 134 |
+
<tr>
|
| 135 |
+
<td align="left">GAIA</td>
|
| 136 |
+
<td align="center">59.8</td>
|
| 137 |
+
<td align="center">78.64</td>
|
| 138 |
+
<td align="center">82.52</td>
|
| 139 |
+
<td align="center">84.5</td>
|
| 140 |
+
<td align="center">80.58</td>
|
| 141 |
+
<td align="center">🥇 98.06</td>
|
| 142 |
+
<td align="center">87.38</td>
|
| 143 |
+
<td align="center">🟢 96.04</td>
|
| 144 |
+
</tr>
|
| 145 |
+
|
| 146 |
+
<tr>
|
| 147 |
+
<td colspan="9" align="left"><b>⚙️ Engineering Tasks</b></td>
|
| 148 |
+
</tr>
|
| 149 |
+
|
| 150 |
+
<tr>
|
| 151 |
+
<td align="left">SciCode</td>
|
| 152 |
+
<td align="center">37.7</td>
|
| 153 |
+
<td align="center">35.8</td>
|
| 154 |
+
<td align="center">29.9</td>
|
| 155 |
+
<td align="center">40.4</td>
|
| 156 |
+
<td align="center">53.5</td>
|
| 157 |
+
<td align="center">50.0</td>
|
| 158 |
+
<td align="center">🥇 56.1</td>
|
| 159 |
+
<td align="center">🟢 44.33</td>
|
| 160 |
+
</tr>
|
| 161 |
+
|
| 162 |
+
<tr>
|
| 163 |
+
<td align="left">MLE-Lite</td>
|
| 164 |
+
<td align="center">24.24</td>
|
| 165 |
+
<td align="center">34.85</td>
|
| 166 |
+
<td align="center">34.85</td>
|
| 167 |
+
<td align="center">54.55</td>
|
| 168 |
+
<td align="center">62.12</td>
|
| 169 |
+
<td align="center">63.64</td>
|
| 170 |
+
<td align="center">🥇 72.73</td>
|
| 171 |
+
<td align="center">🟢 43.94</td>
|
| 172 |
+
</tr>
|
| 173 |
+
|
| 174 |
+
<tr>
|
| 175 |
+
<td colspan="9" align="left"><b>🧪 Scientific Research</b></td>
|
| 176 |
+
</tr>
|
| 177 |
+
|
| 178 |
+
<tr>
|
| 179 |
+
<td align="left">HLE w/ tools</td>
|
| 180 |
+
<td align="center">47.4</td>
|
| 181 |
+
<td align="center">36.2</td>
|
| 182 |
+
<td align="center">32.0</td>
|
| 183 |
+
<td align="center">23.1</td>
|
| 184 |
+
<td align="center">🥇 54.0</td>
|
| 185 |
+
<td align="center">48.2</td>
|
| 186 |
+
<td align="center">52.2</td>
|
| 187 |
+
<td align="center">🟢 47.6</td>
|
| 188 |
+
</tr>
|
| 189 |
+
|
| 190 |
+
<tr>
|
| 191 |
+
<td align="left">HiPhO</td>
|
| 192 |
+
<td align="center">37.0</td>
|
| 193 |
+
<td align="center">37.7</td>
|
| 194 |
+
<td align="center">38.5</td>
|
| 195 |
+
<td align="center">38.3</td>
|
| 196 |
+
<td align="center">41.1</td>
|
| 197 |
+
<td align="center">38.7</td>
|
| 198 |
+
<td align="center">43.3</td>
|
| 199 |
+
<td align="center">🥇 46.4</td>
|
| 200 |
+
</tr>
|
| 201 |
+
|
| 202 |
+
<tr>
|
| 203 |
+
<td align="left">FrontierScience-Olympiad</td>
|
| 204 |
+
<td align="center">64.5</td>
|
| 205 |
+
<td align="center">60.3</td>
|
| 206 |
+
<td align="center">52.0</td>
|
| 207 |
+
<td align="center">61.0</td>
|
| 208 |
+
<td align="center">73.0</td>
|
| 209 |
+
<td align="center">76.0</td>
|
| 210 |
+
<td align="center">78.0</td>
|
| 211 |
+
<td align="center">🥇 79.0</td>
|
| 212 |
+
</tr>
|
| 213 |
+
|
| 214 |
+
<tr>
|
| 215 |
+
<td align="left">FrontierScience-Research</td>
|
| 216 |
+
<td align="center">2.5</td>
|
| 217 |
+
<td align="center">2.9</td>
|
| 218 |
+
<td align="center">5.0</td>
|
| 219 |
+
<td align="center">6.7</td>
|
| 220 |
+
<td align="center">17.9</td>
|
| 221 |
+
<td align="center">13.3</td>
|
| 222 |
+
<td align="center">26.7</td>
|
| 223 |
+
<td align="center">🥇 40.0</td>
|
| 224 |
+
</tr>
|
| 225 |
+
|
| 226 |
+
<tr>
|
| 227 |
+
<td colspan="9" align="left"><b>📋 Instruction Following</b></td>
|
| 228 |
+
</tr>
|
| 229 |
+
|
| 230 |
+
<tr>
|
| 231 |
+
<td align="left">IFBench</td>
|
| 232 |
+
<td align="center">70.2</td>
|
| 233 |
+
<td align="center">64.4</td>
|
| 234 |
+
<td align="center">54.08</td>
|
| 235 |
+
<td align="center">64.6</td>
|
| 236 |
+
<td align="center">71.77</td>
|
| 237 |
+
<td align="center">73.47</td>
|
| 238 |
+
<td align="center">75.9</td>
|
| 239 |
+
<td align="center">🥇 80.61</td>
|
| 240 |
+
</tr>
|
| 241 |
+
|
| 242 |
+
<tr>
|
| 243 |
+
<td align="left">LongBench-v2</td>
|
| 244 |
+
<td align="center">59.0</td>
|
| 245 |
+
<td align="center">57.7</td>
|
| 246 |
+
<td align="center">59.6</td>
|
| 247 |
+
<td align="center">57.5</td>
|
| 248 |
+
<td align="center">62.0</td>
|
| 249 |
+
<td align="center">🥇 64.3</td>
|
| 250 |
+
<td align="center">-</td>
|
| 251 |
+
<td align="center">🟢 60.2</td>
|
| 252 |
+
</tr>
|
| 253 |
+
|
| 254 |
+
<tr>
|
| 255 |
+
<td align="left">IFEval</td>
|
| 256 |
+
<td align="center">91.9</td>
|
| 257 |
+
<td align="center">91.3</td>
|
| 258 |
+
<td align="center">88.4</td>
|
| 259 |
+
<td align="center">93.53</td>
|
| 260 |
+
<td align="center">94.45</td>
|
| 261 |
+
<td align="center">93.35</td>
|
| 262 |
+
<td align="center">93.35</td>
|
| 263 |
+
<td align="center">🥇 94.82</td>
|
| 264 |
+
</tr>
|
| 265 |
+
|
| 266 |
+
<tr>
|
| 267 |
+
<td colspan="9" align="left"><b>🤖 General Agentic Tasks</b></td>
|
| 268 |
+
</tr>
|
| 269 |
+
|
| 270 |
+
<tr>
|
| 271 |
+
<td align="left">τ<sup>2</sup>-Bench</td>
|
| 272 |
+
<td align="center">🟢 81.2</td>
|
| 273 |
+
<td align="center">79.0</td>
|
| 274 |
+
<td align="center">74.53</td>
|
| 275 |
+
<td align="center">75.77</td>
|
| 276 |
+
<td align="center">81.93</td>
|
| 277 |
+
<td align="center">🥇 82.2</td>
|
| 278 |
+
<td align="center">81.63</td>
|
| 279 |
+
<td align="center">79.81</td>
|
| 280 |
+
</tr>
|
| 281 |
+
|
| 282 |
+
<tr>
|
| 283 |
+
<td align="left">VitaBench</td>
|
| 284 |
+
<td align="center">31.9</td>
|
| 285 |
+
<td align="center">35.6</td>
|
| 286 |
+
<td align="center">23.0</td>
|
| 287 |
+
<td align="center">30.0</td>
|
| 288 |
+
<td align="center">35.63</td>
|
| 289 |
+
<td align="center">🥇 49.04</td>
|
| 290 |
+
<td align="center">45.0</td>
|
| 291 |
+
<td align="center">🟢 38.75</td>
|
| 292 |
+
</tr>
|
| 293 |
+
|
| 294 |
+
<tr>
|
| 295 |
+
<td colspan="9" align="left"><b>🔬 Scientific Agentic Tasks</b></td>
|
| 296 |
+
</tr>
|
| 297 |
+
|
| 298 |
+
<tr>
|
| 299 |
+
<td align="left">MatTools</td>
|
| 300 |
+
<td align="center">21.0</td>
|
| 301 |
+
<td align="center">15.9</td>
|
| 302 |
+
<td align="center">34.1</td>
|
| 303 |
+
<td align="center">44.93</td>
|
| 304 |
+
<td align="center">63.8</td>
|
| 305 |
+
<td align="center">47.1</td>
|
| 306 |
+
<td align="center">🥇 68.8</td>
|
| 307 |
+
<td align="center">🟢 47.1</td>
|
| 308 |
+
</tr>
|
| 309 |
+
|
| 310 |
+
<tr>
|
| 311 |
+
<td align="left">MolBench-bind</td>
|
| 312 |
+
<td align="center">46.0</td>
|
| 313 |
+
<td align="center">48.7</td>
|
| 314 |
+
<td align="center">51.4</td>
|
| 315 |
+
<td align="center">45.95</td>
|
| 316 |
+
<td align="center">21.6</td>
|
| 317 |
+
<td align="center">37.8</td>
|
| 318 |
+
<td align="center">🥇 62.2</td>
|
| 319 |
+
<td align="center">🟢 56.8</td>
|
| 320 |
+
</tr>
|
| 321 |
+
|
| 322 |
+
</tbody>
|
| 323 |
+
</table>
|
| 324 |
+
|
| 325 |
+
|
| 326 |
+
## Usage
|
| 327 |
+
|
| 328 |
+
### SGLang
|
| 329 |
+
|
| 330 |
+
[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
|
| 331 |
+
|
| 332 |
+
Install SGLang with uv:
|
| 333 |
+
|
| 334 |
+
```shell
|
| 335 |
+
uv venv --python 3.12 --seed --managed-python
|
| 336 |
+
source .venv/bin/activate
|
| 337 |
+
|
| 338 |
+
uv pip install sglang
|
| 339 |
+
```
|
| 340 |
+
|
| 341 |
+
See [its documentation](https://docs.sglang.ai/get_started/install.html) for more details.
|
| 342 |
+
|
| 343 |
+
The following commands create API endpoints at `http://localhost:8000/v1`:
|
| 344 |
+
|
| 345 |
+
- **Standard Version** (1 GPUs, 262K context):
|
| 346 |
+
|
| 347 |
+
```shell
|
| 348 |
+
python -m sglang.launch_server \
|
| 349 |
+
--model-path InternScience/Agents-A1 \
|
| 350 |
+
--port 8000 \
|
| 351 |
+
--tp-size 1 \
|
| 352 |
+
--mem-fraction-static 0.8 \
|
| 353 |
+
--context-length 262144 \
|
| 354 |
+
--reasoning-parser qwen3
|
| 355 |
+
```
|
| 356 |
+
- **Tool Use**:
|
| 357 |
+
|
| 358 |
+
```shell
|
| 359 |
+
python -m sglang.launch_server \
|
| 360 |
+
--model-path InternScience/Agents-A1 \
|
| 361 |
+
--port 8000 \
|
| 362 |
+
--tp-size 1 \
|
| 363 |
+
--mem-fraction-static 0.8 \
|
| 364 |
+
--context-length 262144 \
|
| 365 |
+
--reasoning-parser qwen3 \
|
| 366 |
+
--tool-call-parser qwen3_coder
|
| 367 |
+
```
|
| 368 |
+
|
| 369 |
+
### vLLM
|
| 370 |
+
|
| 371 |
+
[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs.
|
| 372 |
+
|
| 373 |
+
Install vLLM from the main branch via uv:
|
| 374 |
+
|
| 375 |
+
```shell
|
| 376 |
+
uv venv --python 3.12 --seed --managed-python
|
| 377 |
+
source .venv/bin/activate
|
| 378 |
+
|
| 379 |
+
uv pip install vllm --torch-backend=auto
|
| 380 |
+
```
|
| 381 |
+
|
| 382 |
+
See [its documentation](https://docs.vllm.ai/en/stable/getting_started/installation/index.html) for more details.
|
| 383 |
+
|
| 384 |
+
The following commands create API endpoints at `http://localhost:8000/v1`:
|
| 385 |
+
|
| 386 |
+
- **Standard Version** (1 GPUs, 262K context):
|
| 387 |
+
|
| 388 |
+
```shell
|
| 389 |
+
vllm serve InternScience/Agents-A1 \
|
| 390 |
+
--port 8000 \
|
| 391 |
+
--tensor-parallel-size 1 \
|
| 392 |
+
--max-model-len 262144 \
|
| 393 |
+
--reasoning-parser qwen3
|
| 394 |
+
```
|
| 395 |
+
- **Tool Call**:
|
| 396 |
+
|
| 397 |
+
```shell
|
| 398 |
+
vllm serve InternScience/Agents-A1 \
|
| 399 |
+
--port 8000 \
|
| 400 |
+
--tensor-parallel-size 1 \
|
| 401 |
+
--max-model-len 262144 \
|
| 402 |
+
--reasoning-parser qwen3 \
|
| 403 |
+
--enable-auto-tool-choice \
|
| 404 |
+
--tool-call-parser qwen3_coder
|
| 405 |
+
```
|
| 406 |
+
- **Text-Only** (skips vision encoder to free KV cache memory):
|
| 407 |
+
|
| 408 |
+
```shell
|
| 409 |
+
vllm serve InternScience/Agents-A1 \
|
| 410 |
+
--port 8000 \
|
| 411 |
+
--tensor-parallel-size 1 \
|
| 412 |
+
--max-model-len 262144 \
|
| 413 |
+
--reasoning-parser qwen3 \
|
| 414 |
+
--language-model-only
|
| 415 |
+
```
|
| 416 |
+
|
| 417 |
+
### Recommended Sampling Parameters
|
| 418 |
+
|
| 419 |
+
For the best generation quality, we recommend the following sampling parameters:
|
| 420 |
+
|
| 421 |
+
- `temperature`: 0.85
|
| 422 |
+
- `top_p`: 0.95
|
| 423 |
+
- `top_k`: 20
|
| 424 |
+
- `min_p`: 0.0
|
| 425 |
+
- `presence_penalty`: 1.1
|
| 426 |
+
- `repetition_penalty`: 1.0
|
| 427 |
+
|
| 428 |
+
|
| 429 |
+
## Agent Capability Evaluation
|
| 430 |
+
|
| 431 |
+
To provide the community with a unified agent evaluation codebase for fair comparison, we have also open-sourced an evaluation framework for assessing agentic models across core capabilities, including tool use and multi-step reasoning. The evaluation code is included in the [Agents-A1/evaluation](https://github.com/InternScience/Agents-A1/tree/main/evaluation) of this repository.
|
| 432 |
+
|
| 433 |
+
We use this framework to evaluate the released model under a standardized and reproducible setting.
|
| 434 |
+
Specifically, the model is tested on a set of agent-oriented tasks that require it to understand user goals, decompose complex instructions, interact with tools or environments when necessary, and produce final results. The evaluation results reported in [Model Card](https://huggingface.co/InternScience/Agents-A1) are generated using the open-source framework above, so that users can reproduce the experiments, compare other models under the same protocol, and further extend the benchmark for new agent scenarios. (**Note that:** To ensure a fair comparison, we report the benchmark results from their original technical reports. If a model does not report the corresponding benchmark results, we evaluate it using the same evaluation protocol as our model.)
|
| 435 |
+
|
| 436 |
+
For detailed evaluation scripts, task definitions, metrics, and reproduction instructions, please refer to the evaluation codebase.
|
| 437 |
+
|
| 438 |
+
## Citation
|
| 439 |
+
|
| 440 |
+
If you find our work helpful, feel free to give us a cite.
|
| 441 |
+
|
| 442 |
+
```
|
| 443 |
+
|
| 444 |
+
```
|
figures/24px.svg
ADDED
|
|
figures/a1_benchmarks_altair_grid.svg
ADDED
|
|
figures/github-logo.svg
ADDED
|
|
figures/hf-logo.svg
ADDED
|
|
figures/logo_nobg.png
ADDED
|
Git LFS Details
|
figures/modelscope-logo.svg
ADDED
|
|