Text Generation
Transformers
Safetensors
GGUF
PEFT
English
qwen2
agent
function-calling
tool-use
h-res
manifold-steering
uraion-labs
uraion
iclr-2026
associative-memory
hopfield
neural-collapse
qwen2.5
sft
trl
hermes-function-calling
apigen
xlam
toolace
conversational
text-generation-inference
Instructions to use UraionLabs/Uraion-Agent-Steer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use UraionLabs/Uraion-Agent-Steer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="UraionLabs/Uraion-Agent-Steer") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("UraionLabs/Uraion-Agent-Steer") model = AutoModelForCausalLM.from_pretrained("UraionLabs/Uraion-Agent-Steer") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use UraionLabs/Uraion-Agent-Steer with PEFT:
Task type is invalid.
- llama-cpp-python
How to use UraionLabs/Uraion-Agent-Steer with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="UraionLabs/Uraion-Agent-Steer", filename="Uraion-Agent-Steer-Q2_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use UraionLabs/Uraion-Agent-Steer with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M # Run inference directly in the terminal: llama cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M # Run inference directly in the terminal: llama cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
Use Docker
docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use UraionLabs/Uraion-Agent-Steer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "UraionLabs/Uraion-Agent-Steer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UraionLabs/Uraion-Agent-Steer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
- SGLang
How to use UraionLabs/Uraion-Agent-Steer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "UraionLabs/Uraion-Agent-Steer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UraionLabs/Uraion-Agent-Steer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "UraionLabs/Uraion-Agent-Steer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UraionLabs/Uraion-Agent-Steer", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use UraionLabs/Uraion-Agent-Steer with Ollama:
ollama run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
- Unsloth Studio
How to use UraionLabs/Uraion-Agent-Steer with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for UraionLabs/Uraion-Agent-Steer to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for UraionLabs/Uraion-Agent-Steer to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for UraionLabs/Uraion-Agent-Steer to start chatting
- Pi
How to use UraionLabs/Uraion-Agent-Steer with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "UraionLabs/Uraion-Agent-Steer:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use UraionLabs/Uraion-Agent-Steer with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default UraionLabs/Uraion-Agent-Steer:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use UraionLabs/Uraion-Agent-Steer with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "UraionLabs/Uraion-Agent-Steer:Q4_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use UraionLabs/Uraion-Agent-Steer with Docker Model Runner:
docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
- Lemonade
How to use UraionLabs/Uraion-Agent-Steer with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull UraionLabs/Uraion-Agent-Steer:Q4_K_M
Run and chat with the model
lemonade run user.Uraion-Agent-Steer-Q4_K_M
List all available models
lemonade list
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,589 +1,58 @@
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen2.5-7B-Instruct
|
| 3 |
-
base_model_relation: finetune
|
| 4 |
library_name: transformers
|
| 5 |
-
|
| 6 |
-
language:
|
| 7 |
-
- en
|
| 8 |
-
pipeline_tag: text-generation
|
| 9 |
tags:
|
| 10 |
-
-
|
| 11 |
-
- function-calling
|
| 12 |
-
- tool-use
|
| 13 |
-
- h-res
|
| 14 |
-
- manifold-steering
|
| 15 |
-
- peft
|
| 16 |
-
- uraion-labs
|
| 17 |
-
- uraion
|
| 18 |
-
- iclr-2026
|
| 19 |
-
- associative-memory
|
| 20 |
-
- hopfield
|
| 21 |
-
- neural-collapse
|
| 22 |
-
- qwen2.5
|
| 23 |
-
- sft
|
| 24 |
- trl
|
| 25 |
-
-
|
| 26 |
-
|
| 27 |
-
- xlam
|
| 28 |
-
- toolace
|
| 29 |
-
datasets:
|
| 30 |
-
- NousResearch/hermes-function-calling-v1
|
| 31 |
-
- Salesforce/xlam-function-calling-60k
|
| 32 |
-
- mlabonne/FineTome-100k
|
| 33 |
-
- Salesforce/APIGen-MT-5k
|
| 34 |
-
- glaiveai/glaive-function-calling-v2
|
| 35 |
-
- Team-ACE/ToolACE
|
| 36 |
-
inference:
|
| 37 |
-
parameters:
|
| 38 |
-
temperature: 0.7
|
| 39 |
-
top_p: 0.95
|
| 40 |
-
max_new_tokens: 4096
|
| 41 |
-
---
|
| 42 |
-
|
| 43 |
-
<p align="center">
|
| 44 |
-
<picture>
|
| 45 |
-
<source media="(prefers-color-scheme: dark)" srcset="https://uraionlabs.com/public/icons/icon-192.png">
|
| 46 |
-
<img src="https://uraionlabs.com/public/icons/icon-192.png" alt="Uraion Labs" width="64" height="64">
|
| 47 |
-
</picture>
|
| 48 |
-
</p>
|
| 49 |
-
|
| 50 |
-
<p align="center">
|
| 51 |
-
<strong style="font-family: 'Instrument Serif', Georgia, serif; font-size: 2rem; color: #F7F4ED; letter-spacing: -0.02em;">
|
| 52 |
-
Uraion Labs
|
| 53 |
-
</strong>
|
| 54 |
-
<br>
|
| 55 |
-
<span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">Foundational systems research.</span>
|
| 56 |
-
</p>
|
| 57 |
-
|
| 58 |
-
<p align="center">
|
| 59 |
-
<strong style="font-family: 'Inter', sans-serif; font-size: 1.15rem; color: #E45A1A;">
|
| 60 |
-
Uraion-Agent-Steer
|
| 61 |
-
</strong>
|
| 62 |
-
<br>
|
| 63 |
-
<span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">
|
| 64 |
-
Agentic LLM fine-tuned via Hierarchical Residual Steering (H-Res) — steers activations, not weights.
|
| 65 |
-
</span>
|
| 66 |
-
</p>
|
| 67 |
-
|
| 68 |
-
---
|
| 69 |
-
|
| 70 |
-
**Uraion-Agent-Steer** is a 7-billion parameter model adapted from [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) using **H-Res (Hierarchical Residual Steering)** — a novel PEFT method from ["Parallel Manifold Steering"](https://arxiv.org/abs/2606.24396) (ICLR Workshop 2026). Rather than modifying model weights (LoRA) or injecting synthetic tokens (VPT/Prefix Tuning), H-Res learns a **state-dependent vector field** that steers hidden activations into task-specific attractors — preserving the foundation model's associative memory while adapting it for agentic tool use.
|
| 71 |
-
|
| 72 |
-
This is a research artifact in Uraion Labs' systems-first approach: studying novel adaptation mechanisms, the harness layer, evaluation, and deployment of agent-capable models. It is the first publicly available model trained with the full H-Res method.
|
| 73 |
-
|
| 74 |
-
**Intelligence is a systems problem.** This model is one piece of that system — and the adaptation method itself is part of the research.
|
| 75 |
-
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
## The H-Res Method
|
| 79 |
-
|
| 80 |
-
### The problem with existing PEFT
|
| 81 |
-
|
| 82 |
-
| Method | Mechanism | Fatal flaw |
|
| 83 |
-
|--------|-----------|------------|
|
| 84 |
-
| **LoRA** | Modifies weights globally | Catastrophic interference — distorts retrieval dynamics of pre-trained memories |
|
| 85 |
-
| **VPT / Prefix Tuning** | Appends synthetic tokens to input | Buffer congestion — dilutes attention probability mass, weakens associative recall |
|
| 86 |
-
| **H-Res** | Steers activations via vector field | *None of the above* — operates orthogonal to weights and input buffer |
|
| 87 |
-
|
| 88 |
-
### How H-Res works
|
| 89 |
-
|
| 90 |
-
H-Res frames Transformer adaptation as a **control problem on the activation manifold**. Each layer `l` receives a state-dependent residual:
|
| 91 |
-
|
| 92 |
-
```
|
| 93 |
-
z_{l+1} = Attn(z_l) + FFN(z_l) + λ · H_θ(z_l)
|
| 94 |
-
|
| 95 |
-
where H_θ(x) = W_up · GeLU(W_down · x)
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
- **W_down ∈ ℝ^{d×r}** — projects to a low-rank "control manifold" (bottleneck)
|
| 99 |
-
- **W_up ∈ ℝ^{r×d}** — projects the steering signal back to activation space
|
| 100 |
-
- **W_up initialized to zero** — no initialization shock; training starts from the pre-trained energy minimum
|
| 101 |
-
- **λ** — learnable per-layer scaling factor
|
| 102 |
-
- **Applied parallel to self-attention** — via forward hooks, orthogonal to the frozen backbone
|
| 103 |
-
|
| 104 |
-
### Theoretical guarantees (from the paper)
|
| 105 |
-
|
| 106 |
-
| Property | Proof |
|
| 107 |
-
|----------|-------|
|
| 108 |
-
| **Attention entropy preserved** | No synthetic tokens → constant sequence length → H(A_cls) minimal |
|
| 109 |
-
| **Neural Collapse facilitated** | Residual adapter acts as Maxwell's Demon, filtering task-irrelevant noise |
|
| 110 |
-
| **Zero initialization** | W_up = 0 → H_θ(z) = 0 at t=0 → training starts from global energy minimum |
|
| 111 |
-
| **SSM-compatible** | Operates entirely in residual stream — compatible with Mamba, S4, DeltaNet |
|
| 112 |
-
| **Multi-task orthogonality** | Null-Space Projection of gradients across tasks (Eq. 6 in paper) |
|
| 113 |
-
|
| 114 |
-
---
|
| 115 |
-
|
| 116 |
-
## Contents
|
| 117 |
-
|
| 118 |
-
- [Model Details](#model-details)
|
| 119 |
-
- [H-Res Architecture (Deep Dive)](#h-res-architecture-deep-dive)
|
| 120 |
-
- [Intended Uses & Limitations](#intended-uses--limitations)
|
| 121 |
-
- [Training Data](#training-data)
|
| 122 |
-
- [Training Procedure](#training-procedure)
|
| 123 |
-
- [Hyperparameters](#hyperparameters)
|
| 124 |
-
- [Training Loss](#training-loss)
|
| 125 |
-
- [Quickstart](#quickstart)
|
| 126 |
-
- [H-Res Adapter Analysis](#h-res-adapter-analysis)
|
| 127 |
-
- [Hardware & Infrastructure](#hardware--infrastructure)
|
| 128 |
-
- [GGUF Availability](#gguf-availability)
|
| 129 |
-
- [Ethical Considerations](#ethical-considerations)
|
| 130 |
-
- [Citations](#citations)
|
| 131 |
-
|
| 132 |
-
---
|
| 133 |
-
|
| 134 |
-
## Model Details
|
| 135 |
-
|
| 136 |
-
| Property | Value |
|
| 137 |
-
|----------|-------|
|
| 138 |
-
| **Base model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
|
| 139 |
-
| **Architecture** | Qwen2.5ForCausalLM — 28-layer pure Transformer (RoPE, SwiGLU, RMSNorm) |
|
| 140 |
-
| **Adaptation method** | **H-Res (Hierarchical Residual Steering)** — state-dependent vector field |
|
| 141 |
-
| **Context length** | 32,768 tokens (native, inherited) |
|
| 142 |
-
| **Parameters** | ~7.6B total, 12.8M H-Res trainable (0.17%) |
|
| 143 |
-
| **H-Res rank** | r = 64 per layer |
|
| 144 |
-
| **H-Res layers** | 28/28 injected (all layers compatible) |
|
| 145 |
-
| **Precision** | BF16 (full precision — no quantization of base model) |
|
| 146 |
-
| **License** | Apache 2.0 (inherited from Qwen2.5) |
|
| 147 |
-
| **On-disk size** | ~15.3 GB (BF16 safetensors) |
|
| 148 |
-
| **Paper** | [arXiv:2606.24396](https://arxiv.org/abs/2606.24396) — ICLR Workshop 2026 |
|
| 149 |
-
|
| 150 |
-
### Architecture choice
|
| 151 |
-
|
| 152 |
-
Qwen2.5-7B-Instruct was chosen for this H-Res implementation because:
|
| 153 |
-
|
| 154 |
-
1. **Pure Transformer** — 28 identical decoder layers with standard `input_layernorm` + `self_attn` + `post_attention_layernorm` + `mlp` — cleanest architecture for H-Res hook injection
|
| 155 |
-
2. **Apache 2.0 license** — no gated access, no approval required, fully open
|
| 156 |
-
3. **Strong instruct base** — already instruction-tuned, providing a solid foundation for agentic adaptation
|
| 157 |
-
4. **7B weight class** — punches above its weight on agent benchmarks while fitting comfortably on A100-40GB
|
| 158 |
-
|
| 159 |
-
---
|
| 160 |
-
|
| 161 |
-
## H-Res Architecture (Deep Dive)
|
| 162 |
-
|
| 163 |
-
### Injection mechanism
|
| 164 |
-
|
| 165 |
-
H-Res adapters are injected into each transformer layer via **PyTorch forward hooks** — no monkey-patching of forward methods, no model code modification:
|
| 166 |
-
|
| 167 |
-
```
|
| 168 |
-
Layer forward (simplified):
|
| 169 |
-
┌─────────────────────────────────────────────┐
|
| 170 |
-
│ residual = hidden_states │
|
| 171 |
-
│ normed = input_layernorm(hidden_states) │
|
| 172 |
-
│ │
|
| 173 |
-
│ attn_out = self_attn(normed) ← frozen │
|
| 174 |
-
│ hres_out = hres(normed) ← trained │ ← Hook: captures normed, adds to attn output
|
| 175 |
-
│ │
|
| 176 |
-
│ hidden_states = residual + attn_out + hres_out │
|
| 177 |
-
│ hidden_states = hidden_states + mlp(norm(hidden_states)) │
|
| 178 |
-
└─────────────────────────────────────────────┘
|
| 179 |
-
```
|
| 180 |
-
|
| 181 |
-
### Per-layer H-Res parameters
|
| 182 |
-
|
| 183 |
-
Each of the 28 layers contains:
|
| 184 |
-
|
| 185 |
-
```
|
| 186 |
-
HResAdapter:
|
| 187 |
-
W_down: Linear(3584 → 64, bias=False) 228,544 params
|
| 188 |
-
W_up: Linear(64 → 3584, bias=False) 228,544 params
|
| 189 |
-
scale: scalar (learnable) 1 param
|
| 190 |
-
─────────────────────────────────────────────────────
|
| 191 |
-
Total per layer: 457,089 params
|
| 192 |
-
Total (28 layers): 12,798,492 params
|
| 193 |
-
% of base model (7.6B): 0.17%
|
| 194 |
-
```
|
| 195 |
-
|
| 196 |
-
### Initialization (per paper Section 2.3)
|
| 197 |
-
|
| 198 |
-
```python
|
| 199 |
-
W_down ~ N(0, 1/d_model) # Normal with σ = 1/√3584
|
| 200 |
-
W_up = 0 # Zero — preserves pre-trained energy minimum
|
| 201 |
-
scale = 0.1 # Small constant — gentle ramp-up
|
| 202 |
-
```
|
| 203 |
-
|
| 204 |
-
At initialization, H_θ(x) = 0 for all x → the model behaves identically to the frozen base. Training gradually "turns on" the steering field.
|
| 205 |
-
|
| 206 |
-
### What H-Res is NOT
|
| 207 |
-
|
| 208 |
-
- **NOT LoRA** — doesn't modify frozen weights; computes input-dependent residuals
|
| 209 |
-
- **NOT an adapter** — doesn't sit sequentially after attention/MLP; runs *parallel* to self-attention
|
| 210 |
-
- **NOT a prompt method** — doesn't add tokens to the input sequence
|
| 211 |
-
- **NOT a mixture-of-experts** — all layers are always active; the "expertise" is in the learned vector field
|
| 212 |
-
|
| 213 |
-
---
|
| 214 |
-
|
| 215 |
-
## Intended Uses & Limitations
|
| 216 |
-
|
| 217 |
-
### Intended use
|
| 218 |
-
|
| 219 |
-
- **Tool-calling agents** — function calling, API orchestration, multi-turn tool use
|
| 220 |
-
- **Agent frameworks** — drop-in replacement for agent runtimes (OpenAI-compatible via vLLM)
|
| 221 |
-
- **Systems research** — studying the H-Res adaptation mechanism, its properties, and its limits
|
| 222 |
-
- **Associative retrieval tasks** — the H-Res method specifically excels at retrieval (26% better than LoRA on SQuAD per the paper)
|
| 223 |
-
|
| 224 |
-
### Out-of-scope
|
| 225 |
-
|
| 226 |
-
- **Production deployment without validation** — research artifact; evaluate on your specific use case
|
| 227 |
-
- **High-stakes decision making** — not intended for medical, legal, or financial advice without human oversight
|
| 228 |
-
- **Unsupported languages** — trained exclusively on English data
|
| 229 |
-
- **Multimodal tasks** — text-only fine-tune
|
| 230 |
-
|
| 231 |
-
### Limitations
|
| 232 |
-
|
| 233 |
-
- **Trained for 1 epoch** on ~35K examples. More data/epochs would improve tool-calling reliability.
|
| 234 |
-
- **H-Res is a research method** — this is the first public deployment; edge cases may exist.
|
| 235 |
-
- **GGUF conversion** — H-Res adapters are state-dependent (nonlinear), so they can't be directly merged into base weights for standard GGUF conversion. A LoRA-distilled GGUF version is available separately.
|
| 236 |
-
- **May produce malformed tool calls** in edge cases — validate output before execution.
|
| 237 |
-
- **7B weight class** — while punching above its weight, has inherent capacity limits compared to larger models.
|
| 238 |
-
|
| 239 |
-
---
|
| 240 |
-
|
| 241 |
-
## Training Data
|
| 242 |
-
|
| 243 |
-
Six datasets were curated for agentic capability — prioritizing function-calling and tool-use signal over raw instruction volume:
|
| 244 |
-
|
| 245 |
-
| Dataset | Type | Samples | Focus |
|
| 246 |
-
|---------|------|---------|-------|
|
| 247 |
-
| [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | Function calling | 1,893 | Single-turn and multi-turn tool use conversations (MIT) |
|
| 248 |
-
| [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | Function calling | 10,000 | Diverse API function calling (sampled from 60K, MIT) |
|
| 249 |
-
| [mlabonne/FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) | Instruction following | 20,000 | General instruct/chat data (sampled from 100K, MIT) |
|
| 250 |
-
| [Salesforce/APIGen-MT-5k](https://huggingface.co/datasets/Salesforce/APIGen-MT-5k) | API generation | 5,000 | Multi-turn API call generation across diverse APIs (MIT) |
|
| 251 |
-
| [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | Function calling | 8,000 | Multi-turn tool-use conversations (MIT) |
|
| 252 |
-
| [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) | Tool use | 8,000 | Agentic tool-use conversations (Apache 2.0) |
|
| 253 |
-
| **Total** | | **52,893 raw → 34,893 filtered** | |
|
| 254 |
-
|
| 255 |
-
All data formatted via `tokenizer.apply_chat_template()` with the Qwen2.5 ChatML template. Examples without a `user` role were filtered. Sequence length capped at 2,048 tokens.
|
| 256 |
-
|
| 257 |
-
---
|
| 258 |
-
|
| 259 |
-
## Training Procedure
|
| 260 |
-
|
| 261 |
-
### Framework
|
| 262 |
-
|
| 263 |
-
- **Training**: HuggingFace TRL `SFTTrainer` with `SFTConfig`
|
| 264 |
-
- **Adaptation**: H-Res — custom `HResAdapter` injected via forward hooks (no PEFT library dependency for the core method)
|
| 265 |
-
- **Quantization**: None — full BF16 precision for base model (H-Res adds only 0.17% trainable params)
|
| 266 |
-
- **Attention**: PyTorch SDPA (`attn_implementation="sdpa"`)
|
| 267 |
-
- **Loss**: Standard causal language modeling (no packing)
|
| 268 |
-
|
| 269 |
-
### Pipeline
|
| 270 |
-
|
| 271 |
-
1. **Model loading**: BF16 full precision via `AutoModelForCausalLM.from_pretrained()`
|
| 272 |
-
2. **H-Res injection**: Forward hooks on `input_layernorm` (capture) + `self_attn` (inject)
|
| 273 |
-
3. **Base model freeze**: `model.requires_grad_(False)` — only H-Res params trainable
|
| 274 |
-
4. **Dataset processing**: ShareGPT → ChatML → filtered → concatenated → shuffled
|
| 275 |
-
5. **Training**: `SFTTrainer` with `dataset_text_field="text"`, `packing=False`, `gradient_checkpointing=True`
|
| 276 |
-
6. **Export**: `model.save_pretrained(safe_serialization=True)` — H-Res adapters embedded in model state dict
|
| 277 |
-
7. **Upload**: `HfApi.upload_folder()` → `UraionLabs/Uraion-Agent-Steer`
|
| 278 |
-
|
| 279 |
-
### Novel aspects
|
| 280 |
-
|
| 281 |
-
This training represents the **first public implementation** of the full H-Res method:
|
| 282 |
-
|
| 283 |
-
- **Hook-based injection** — no model code modification; works with any HuggingFace Transformer
|
| 284 |
-
- **Full BF16 precision** — no quantization noise; H-Res is parameter-efficient enough to not need it
|
| 285 |
-
- **Learnable scale parameter λ** — per-layer, initialized at 0.1, allowing layers to independently adjust steering intensity
|
| 286 |
-
- **Architecture-agnostic** — the same injection code works on Llama, Mistral, Qwen2/3, Gemma, and Phi
|
| 287 |
-
|
| 288 |
-
---
|
| 289 |
-
|
| 290 |
-
## Hyperparameters
|
| 291 |
-
|
| 292 |
-
### H-Res
|
| 293 |
-
|
| 294 |
-
| Parameter | Value |
|
| 295 |
-
|-----------|-------|
|
| 296 |
-
| `r` (bottleneck rank) | 64 |
|
| 297 |
-
| `d_model` (hidden size) | 3584 |
|
| 298 |
-
| `W_down init` | N(0, 1/d_model) |
|
| 299 |
-
| `W_up init` | 0 (zero) |
|
| 300 |
-
| `scale init` | 0.1 |
|
| 301 |
-
| `activation` | GeLU |
|
| 302 |
-
| `bias` | None |
|
| 303 |
-
|
| 304 |
-
### Training
|
| 305 |
-
|
| 306 |
-
| Parameter | Value |
|
| 307 |
-
|-----------|-------|
|
| 308 |
-
| **Sequence length** | 2048 |
|
| 309 |
-
| **Effective batch size** | 32 |
|
| 310 |
-
| **Per-device batch** | 2 |
|
| 311 |
-
| **Gradient accumulation** | 16 |
|
| 312 |
-
| **Learning rate** | 1×10⁻⁴ |
|
| 313 |
-
| **LR scheduler** | Cosine with warmup |
|
| 314 |
-
| **Warmup ratio** | 0.03 |
|
| 315 |
-
| **Optimizer** | AdamW 8-bit |
|
| 316 |
-
| **Epochs** | 1 |
|
| 317 |
-
| **Max steps** | 1,091 |
|
| 318 |
-
| **Weight decay** | 0.0 |
|
| 319 |
-
| **Gradient checkpointing** | True (non-reentrant) |
|
| 320 |
-
| **Precision** | BF16 |
|
| 321 |
-
| **Logging steps** | 10 |
|
| 322 |
-
| **Save steps** | 50 |
|
| 323 |
-
| **Save total limit** | 3 |
|
| 324 |
-
|
| 325 |
---
|
| 326 |
|
| 327 |
-
#
|
| 328 |
-
|
| 329 |
-
| Step | Loss | Δ from start | Notes |
|
| 330 |
-
|------|------|-------------|-------|
|
| 331 |
-
| 10 | 1.310 | — | Initial — H-Res scale still ramping |
|
| 332 |
-
| 20 | 1.264 | ↓ 3.5% | W_up beginning to activate |
|
| 333 |
-
| 50 | 1.013 | ↓ 22.7% | First checkpoint saved; steering field forming |
|
| 334 |
-
| 100 | 0.879 | ↓ 32.9% | Rapid convergence phase |
|
| 335 |
-
| 200 | 0.741 | ↓ 43.4% | Entering fine-tuning regime |
|
| 336 |
-
| 300 | 0.745 | ↓ 43.1% | Stable convergence |
|
| 337 |
-
| 400 | 0.699 | ↓ 46.6% | Steady improvement |
|
| 338 |
-
| 500 | 0.689 | ↓ 47.4% | Approaching plateau |
|
| 339 |
-
| 600 | 0.645 | ↓ 50.8% | Best single-step loss |
|
| 340 |
-
| 700 | 0.688 | ↓ 47.5% | Minor oscillation — normal |
|
| 341 |
-
| 800 | 0.646 | ↓ 50.7% | Consistent low-loss regime |
|
| 342 |
-
| 900 | 0.663 | ↓ 49.4% | Stable |
|
| 343 |
-
| 1000 | 0.67 | ↓ 48.9% | Final stretch |
|
| 344 |
-
| **1091** | **0.657** | **↓ 49.8%** | **Final — 50% loss reduction** |
|
| 345 |
-
|
| 346 |
-
**Key observations:**
|
| 347 |
-
- **Rapid early convergence** — 22.7% loss reduction by step 50 (first 4.6% of training)
|
| 348 |
-
- **Smooth learning curve** — no spikes, no divergence, consistent downward trend
|
| 349 |
-
- **50% total loss reduction** — from 1.310 to 0.657
|
| 350 |
-
- **H-Res's zero-initialization advantage** — no "initialization shock" means the model starts from a good place and improves monotonically
|
| 351 |
-
|
| 352 |
-
---
|
| 353 |
|
| 354 |
-
|
|
|
|
| 355 |
|
| 356 |
-
##
|
| 357 |
|
| 358 |
```python
|
| 359 |
-
import torch
|
| 360 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 361 |
-
|
| 362 |
-
model_name = "UraionLabs/Uraion-Agent-Steer"
|
| 363 |
-
|
| 364 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
| 365 |
-
model = AutoModelForCausalLM.from_pretrained(
|
| 366 |
-
model_name,
|
| 367 |
-
torch_dtype=torch.bfloat16,
|
| 368 |
-
device_map="auto",
|
| 369 |
-
trust_remote_code=True,
|
| 370 |
-
)
|
| 371 |
-
|
| 372 |
-
# The model includes H-Res adapters — no extra loading needed
|
| 373 |
-
messages = [
|
| 374 |
-
{"role": "system", "content": "You are Uraion-Agent-Steer, an agent with tool-use capabilities. Use tools when appropriate."},
|
| 375 |
-
{"role": "user", "content": "What's the weather in Tokyo? Should I bring an umbrella?"},
|
| 376 |
-
]
|
| 377 |
-
|
| 378 |
-
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 379 |
-
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 380 |
-
|
| 381 |
-
outputs = model.generate(
|
| 382 |
-
**inputs,
|
| 383 |
-
max_new_tokens=512,
|
| 384 |
-
temperature=0.7,
|
| 385 |
-
top_p=0.95,
|
| 386 |
-
do_sample=True,
|
| 387 |
-
)
|
| 388 |
-
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
|
| 389 |
-
print(response)
|
| 390 |
-
```
|
| 391 |
-
|
| 392 |
-
### With `pipeline`
|
| 393 |
-
|
| 394 |
-
```python
|
| 395 |
-
import torch
|
| 396 |
from transformers import pipeline
|
| 397 |
|
| 398 |
-
|
| 399 |
-
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
device_map="auto",
|
| 403 |
-
trust_remote_code=True,
|
| 404 |
-
)
|
| 405 |
-
|
| 406 |
-
messages = [
|
| 407 |
-
{"role": "system", "content": "You are a helpful agent with access to tools."},
|
| 408 |
-
{"role": "user", "content": "Search for the latest AI research papers on arxiv."},
|
| 409 |
-
]
|
| 410 |
-
output = pipe(messages, max_new_tokens=512, temperature=0.7, top_p=0.95)
|
| 411 |
-
print(output[0]["generated_text"][-1]["content"] if isinstance(output[0]["generated_text"], list) else output[0]["generated_text"])
|
| 412 |
```
|
| 413 |
|
| 414 |
-
|
| 415 |
-
|
| 416 |
-
## H-Res Adapter Analysis
|
| 417 |
-
|
| 418 |
-
After training, we inspected the learned H-Res adapters across all 28 layers:
|
| 419 |
|
| 420 |
-
|
| 421 |
-
|-------|-----------|--------|----------|-------------------|
|
| 422 |
-
| 0 (early) | 0.1001 | 0.0000 | 7.94 | **Silent** — shallow layers don't steer |
|
| 423 |
-
| 8 (mid) | 0.1001 | 2.12 | 8.45 | Moderate steering |
|
| 424 |
-
| 16 (mid-deep) | 0.1001 | 2.87 | 9.12 | Active steering |
|
| 425 |
-
| 24 (deep) | 0.1001 | 3.12 | 9.56 | Strong steering |
|
| 426 |
-
| 27 (final) | 0.1001 | **3.72** | **9.69** | **Maximum steering** |
|
| 427 |
|
| 428 |
-
**Key finding:** Steering intensity increases monotonically with layer depth. Early layers (0–3) have W_up ≈ 0 — the adapter is effectively dormant. Deep layers (20–27) have the strongest steering activity. This aligns with the paper's theoretical prediction: H-Res acts primarily on high-level semantic representations in deeper layers, while preserving low-level features in early layers.
|
| 429 |
|
| 430 |
-
The scale parameter λ stayed at ~0.1 across all layers — the model preferred to learn through W_up/W_down rather than adjusting the global scaling factor.
|
| 431 |
-
|
| 432 |
-
---
|
| 433 |
|
| 434 |
-
|
| 435 |
|
| 436 |
-
|
| 437 |
-
|-----------|--------|
|
| 438 |
-
| **Provisioning** | Google Colab CLI (`colab-cli`) via OAuth2 |
|
| 439 |
-
| **GPU** | 1× NVIDIA A100-SXM4-40GB |
|
| 440 |
-
| **Runtime** | `colab run --gpu A100 --keep --timeout 28800` |
|
| 441 |
-
| **Training time** | ~3 hours (1,091 steps at ~10s/step) |
|
| 442 |
-
| **VRAM usage** | ~35 GB (7.6B BF16 base + 12.8M H-Res + activations + optimizer) |
|
| 443 |
-
| **Setup** | Self-installing dependencies via pip |
|
| 444 |
-
| **Session lifecycle** | `colab run` → auto-execute → `--keep` → training → auto-upload → session release |
|
| 445 |
-
|
| 446 |
-
Training dependencies auto-installed on Colab: `transformers>=4.57`, `trl>=0.21`, `datasets`, `accelerate`, `safetensors`, `huggingface_hub`.
|
| 447 |
-
|
| 448 |
-
---
|
| 449 |
-
|
| 450 |
-
## GGUF Availability
|
| 451 |
-
|
| 452 |
-
H-Res adapters are **state-dependent** (nonlinear function of the input), so they can't be directly merged into base weights for standard GGUF/llama.cpp conversion. A separate **LoRA-distilled version** is available for GGUF users:
|
| 453 |
-
|
| 454 |
-
| Format | Repository | Notes |
|
| 455 |
-
|--------|-----------|-------|
|
| 456 |
-
| **Safetensors (H-Res)** | `UraionLabs/Uraion-Agent-Steer` | This repo — full quality, original H-Res method |
|
| 457 |
-
| **GGUF (LoRA-distilled)** | `UraionLabs/Uraion-Agent-Steer-GGUF` | LoRA trained on same data, merged, quantized to all common variants |
|
| 458 |
-
|
| 459 |
-
For maximum quality, use this safetensors release. For local llama.cpp/Ollama/LM Studio inference, use the GGUF release.
|
| 460 |
-
|
| 461 |
-
---
|
| 462 |
|
| 463 |
-
|
| 464 |
-
|
| 465 |
-
|
| 466 |
-
|
| 467 |
-
-
|
| 468 |
-
- Function-calling capabilities could automate actions without human oversight — always validate tool calls before execution.
|
| 469 |
-
- The model has not undergone safety alignment beyond the base model's existing safeguards.
|
| 470 |
-
- The H-Res method is novel — long-term behavior and failure modes are still being studied.
|
| 471 |
-
- This is a **research-stage artifact** from Uraion Labs. We are a systems research lab, not a product company. Use accordingly.
|
| 472 |
-
|
| 473 |
-
---
|
| 474 |
|
| 475 |
## Citations
|
| 476 |
|
| 477 |
-
### H-Res (Parallel Manifold Steering)
|
| 478 |
-
|
| 479 |
-
```bibtex
|
| 480 |
-
@article{awadhiya2026parallel,
|
| 481 |
-
title={Parallel Manifold Steering: Efficient Adaptation of Large
|
| 482 |
-
Associative Memories via Residual Energy Shaping},
|
| 483 |
-
author={Awadhiya, Kanishk},
|
| 484 |
-
journal={ICLR Workshop on New Frontiers in Associative Memory},
|
| 485 |
-
year={2026},
|
| 486 |
-
url={https://arxiv.org/abs/2606.24396}
|
| 487 |
-
}
|
| 488 |
-
```
|
| 489 |
|
| 490 |
-
### Uraion-Agent-Steer
|
| 491 |
-
|
| 492 |
-
```bibtex
|
| 493 |
-
@software{uraion-agent-steer,
|
| 494 |
-
title={Uraion-Agent-Steer: Agentic Model via Hierarchical Residual Steering},
|
| 495 |
-
author={Uraion Labs},
|
| 496 |
-
year={2026},
|
| 497 |
-
url={https://huggingface.co/UraionLabs/Uraion-Agent-Steer}
|
| 498 |
-
}
|
| 499 |
-
```
|
| 500 |
-
|
| 501 |
-
### Qwen2.5
|
| 502 |
-
|
| 503 |
-
```bibtex
|
| 504 |
-
@misc{qwen2.5,
|
| 505 |
-
title={Qwen2.5: A Party of Foundation Models},
|
| 506 |
-
author={Qwen Team},
|
| 507 |
-
year={2025},
|
| 508 |
-
publisher={GitHub},
|
| 509 |
-
url={https://github.com/QwenLM/Qwen2.5}
|
| 510 |
-
}
|
| 511 |
-
```
|
| 512 |
-
|
| 513 |
-
### TRL
|
| 514 |
|
|
|
|
|
|
|
| 515 |
```bibtex
|
| 516 |
@software{vonwerra2020trl,
|
| 517 |
-
title={{TRL: Transformers Reinforcement Learning}},
|
| 518 |
-
author={von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and
|
| 519 |
-
|
| 520 |
-
|
| 521 |
-
|
| 522 |
-
url={https://github.com/huggingface/trl},
|
| 523 |
-
year={2020}
|
| 524 |
-
}
|
| 525 |
-
```
|
| 526 |
-
|
| 527 |
-
### Datasets
|
| 528 |
-
|
| 529 |
-
```bibtex
|
| 530 |
-
@misc{hermesfc,
|
| 531 |
-
title={NousResearch Hermes Function Calling},
|
| 532 |
-
author={Nous Research},
|
| 533 |
-
year={2024},
|
| 534 |
-
url={https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1}
|
| 535 |
-
}
|
| 536 |
-
|
| 537 |
-
@misc{xlam2024,
|
| 538 |
-
title={xLAM: A Family of Large Action Models},
|
| 539 |
-
author={Salesforce AI Research},
|
| 540 |
-
year={2024},
|
| 541 |
-
url={https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k}
|
| 542 |
-
}
|
| 543 |
-
|
| 544 |
-
@misc{finetome2024,
|
| 545 |
-
title={FineTome-100k: A Curated Instruction Tuning Dataset},
|
| 546 |
-
author={Labonne, Maxime},
|
| 547 |
-
year={2024},
|
| 548 |
-
url={https://huggingface.co/datasets/mlabonne/FineTome-100k}
|
| 549 |
-
}
|
| 550 |
-
|
| 551 |
-
@misc{apigen2024,
|
| 552 |
-
title={APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets},
|
| 553 |
-
author={Salesforce AI Research},
|
| 554 |
-
year={2024},
|
| 555 |
-
url={https://huggingface.co/datasets/Salesforce/APIGen-MT-5k}
|
| 556 |
}
|
| 557 |
-
|
| 558 |
-
@misc{glaivefc,
|
| 559 |
-
title={Glaive Function Calling v2},
|
| 560 |
-
author={Glaive AI},
|
| 561 |
-
year={2024},
|
| 562 |
-
url={https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2}
|
| 563 |
-
}
|
| 564 |
-
|
| 565 |
-
@misc{toolace2025,
|
| 566 |
-
title={ToolACE: Winning the Points of LLM Function Calling},
|
| 567 |
-
author={Team ACE},
|
| 568 |
-
year={2025},
|
| 569 |
-
url={https://huggingface.co/datasets/Team-ACE/ToolACE}
|
| 570 |
-
}
|
| 571 |
-
```
|
| 572 |
-
|
| 573 |
-
---
|
| 574 |
-
|
| 575 |
-
<p align="center">
|
| 576 |
-
<img src="https://uraionlabs.com/public/icons/icon-32.png" alt="" width="24" height="24">
|
| 577 |
-
</p>
|
| 578 |
-
|
| 579 |
-
<p align="center" style="font-family: 'Inter', sans-serif; font-size: 0.8rem; color: #8A8478;">
|
| 580 |
-
<strong style="color: #F7F4ED;">Uraion Labs</strong> — Foundational systems research.
|
| 581 |
-
<br>
|
| 582 |
-
<a href="https://uraionlabs.com" style="color: #E45A1A;">uraionlabs.com</a>
|
| 583 |
-
<br><br>
|
| 584 |
-
<em style="color: #6F6A61;">
|
| 585 |
-
Intelligence is a systems problem.
|
| 586 |
-
</em>
|
| 587 |
-
<br>
|
| 588 |
-
Licensed under <a href="https://www.apache.org/licenses/LICENSE-2.0" style="color: #E45A1A;">Apache 2.0</a>.
|
| 589 |
-
</p>
|
|
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen2.5-7B-Instruct
|
|
|
|
| 3 |
library_name: transformers
|
| 4 |
+
model_name: uraion-agent-steer
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
+
- generated_from_trainer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
- trl
|
| 8 |
+
- sft
|
| 9 |
+
licence: license
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Model Card for uraion-agent-steer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
|
| 15 |
+
It has been trained using [TRL](https://github.com/huggingface/trl).
|
| 16 |
|
| 17 |
+
## Quick start
|
| 18 |
|
| 19 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
from transformers import pipeline
|
| 21 |
|
| 22 |
+
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
|
| 23 |
+
generator = pipeline("text-generation", model="None", device="cuda")
|
| 24 |
+
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
| 25 |
+
print(output["generated_text"])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
```
|
| 27 |
|
| 28 |
+
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
|
|
|
| 32 |
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
This model was trained with SFT.
|
| 35 |
|
| 36 |
+
### Framework versions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
- TRL: 1.7.0
|
| 39 |
+
- Transformers: 5.12.0
|
| 40 |
+
- Pytorch: 2.11.0+cu128
|
| 41 |
+
- Datasets: 5.0.0
|
| 42 |
+
- Tokenizers: 0.22.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
## Citations
|
| 45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
Cite TRL as:
|
| 49 |
+
|
| 50 |
```bibtex
|
| 51 |
@software{vonwerra2020trl,
|
| 52 |
+
title = {{TRL: Transformers Reinforcement Learning}},
|
| 53 |
+
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
|
| 54 |
+
license = {Apache-2.0},
|
| 55 |
+
url = {https://github.com/huggingface/trl},
|
| 56 |
+
year = {2020}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
}
|
| 58 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|