UraionLabs commited on
Commit
8a89899
·
verified ·
1 Parent(s): 58d3e12

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +29 -560
README.md CHANGED
@@ -1,589 +1,58 @@
1
  ---
2
  base_model: Qwen/Qwen2.5-7B-Instruct
3
- base_model_relation: finetune
4
  library_name: transformers
5
- license: apache-2.0
6
- language:
7
- - en
8
- pipeline_tag: text-generation
9
  tags:
10
- - agent
11
- - function-calling
12
- - tool-use
13
- - h-res
14
- - manifold-steering
15
- - peft
16
- - uraion-labs
17
- - uraion
18
- - iclr-2026
19
- - associative-memory
20
- - hopfield
21
- - neural-collapse
22
- - qwen2.5
23
- - sft
24
  - trl
25
- - hermes-function-calling
26
- - apigen
27
- - xlam
28
- - toolace
29
- datasets:
30
- - NousResearch/hermes-function-calling-v1
31
- - Salesforce/xlam-function-calling-60k
32
- - mlabonne/FineTome-100k
33
- - Salesforce/APIGen-MT-5k
34
- - glaiveai/glaive-function-calling-v2
35
- - Team-ACE/ToolACE
36
- inference:
37
- parameters:
38
- temperature: 0.7
39
- top_p: 0.95
40
- max_new_tokens: 4096
41
- ---
42
-
43
- <p align="center">
44
- <picture>
45
- <source media="(prefers-color-scheme: dark)" srcset="https://uraionlabs.com/public/icons/icon-192.png">
46
- <img src="https://uraionlabs.com/public/icons/icon-192.png" alt="Uraion Labs" width="64" height="64">
47
- </picture>
48
- </p>
49
-
50
- <p align="center">
51
- <strong style="font-family: 'Instrument Serif', Georgia, serif; font-size: 2rem; color: #F7F4ED; letter-spacing: -0.02em;">
52
- Uraion Labs
53
- </strong>
54
- <br>
55
- <span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">Foundational systems research.</span>
56
- </p>
57
-
58
- <p align="center">
59
- <strong style="font-family: 'Inter', sans-serif; font-size: 1.15rem; color: #E45A1A;">
60
- Uraion-Agent-Steer
61
- </strong>
62
- <br>
63
- <span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">
64
- Agentic LLM fine-tuned via Hierarchical Residual Steering (H-Res) — steers activations, not weights.
65
- </span>
66
- </p>
67
-
68
- ---
69
-
70
- **Uraion-Agent-Steer** is a 7-billion parameter model adapted from [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) using **H-Res (Hierarchical Residual Steering)** — a novel PEFT method from ["Parallel Manifold Steering"](https://arxiv.org/abs/2606.24396) (ICLR Workshop 2026). Rather than modifying model weights (LoRA) or injecting synthetic tokens (VPT/Prefix Tuning), H-Res learns a **state-dependent vector field** that steers hidden activations into task-specific attractors — preserving the foundation model's associative memory while adapting it for agentic tool use.
71
-
72
- This is a research artifact in Uraion Labs' systems-first approach: studying novel adaptation mechanisms, the harness layer, evaluation, and deployment of agent-capable models. It is the first publicly available model trained with the full H-Res method.
73
-
74
- **Intelligence is a systems problem.** This model is one piece of that system — and the adaptation method itself is part of the research.
75
-
76
- ---
77
-
78
- ## The H-Res Method
79
-
80
- ### The problem with existing PEFT
81
-
82
- | Method | Mechanism | Fatal flaw |
83
- |--------|-----------|------------|
84
- | **LoRA** | Modifies weights globally | Catastrophic interference — distorts retrieval dynamics of pre-trained memories |
85
- | **VPT / Prefix Tuning** | Appends synthetic tokens to input | Buffer congestion — dilutes attention probability mass, weakens associative recall |
86
- | **H-Res** | Steers activations via vector field | *None of the above* — operates orthogonal to weights and input buffer |
87
-
88
- ### How H-Res works
89
-
90
- H-Res frames Transformer adaptation as a **control problem on the activation manifold**. Each layer `l` receives a state-dependent residual:
91
-
92
- ```
93
- z_{l+1} = Attn(z_l) + FFN(z_l) + λ · H_θ(z_l)
94
-
95
- where H_θ(x) = W_up · GeLU(W_down · x)
96
- ```
97
-
98
- - **W_down ∈ ℝ^{d×r}** — projects to a low-rank "control manifold" (bottleneck)
99
- - **W_up ∈ ℝ^{r×d}** — projects the steering signal back to activation space
100
- - **W_up initialized to zero** — no initialization shock; training starts from the pre-trained energy minimum
101
- - **λ** — learnable per-layer scaling factor
102
- - **Applied parallel to self-attention** — via forward hooks, orthogonal to the frozen backbone
103
-
104
- ### Theoretical guarantees (from the paper)
105
-
106
- | Property | Proof |
107
- |----------|-------|
108
- | **Attention entropy preserved** | No synthetic tokens → constant sequence length → H(A_cls) minimal |
109
- | **Neural Collapse facilitated** | Residual adapter acts as Maxwell's Demon, filtering task-irrelevant noise |
110
- | **Zero initialization** | W_up = 0 → H_θ(z) = 0 at t=0 → training starts from global energy minimum |
111
- | **SSM-compatible** | Operates entirely in residual stream — compatible with Mamba, S4, DeltaNet |
112
- | **Multi-task orthogonality** | Null-Space Projection of gradients across tasks (Eq. 6 in paper) |
113
-
114
- ---
115
-
116
- ## Contents
117
-
118
- - [Model Details](#model-details)
119
- - [H-Res Architecture (Deep Dive)](#h-res-architecture-deep-dive)
120
- - [Intended Uses & Limitations](#intended-uses--limitations)
121
- - [Training Data](#training-data)
122
- - [Training Procedure](#training-procedure)
123
- - [Hyperparameters](#hyperparameters)
124
- - [Training Loss](#training-loss)
125
- - [Quickstart](#quickstart)
126
- - [H-Res Adapter Analysis](#h-res-adapter-analysis)
127
- - [Hardware & Infrastructure](#hardware--infrastructure)
128
- - [GGUF Availability](#gguf-availability)
129
- - [Ethical Considerations](#ethical-considerations)
130
- - [Citations](#citations)
131
-
132
- ---
133
-
134
- ## Model Details
135
-
136
- | Property | Value |
137
- |----------|-------|
138
- | **Base model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
139
- | **Architecture** | Qwen2.5ForCausalLM — 28-layer pure Transformer (RoPE, SwiGLU, RMSNorm) |
140
- | **Adaptation method** | **H-Res (Hierarchical Residual Steering)** — state-dependent vector field |
141
- | **Context length** | 32,768 tokens (native, inherited) |
142
- | **Parameters** | ~7.6B total, 12.8M H-Res trainable (0.17%) |
143
- | **H-Res rank** | r = 64 per layer |
144
- | **H-Res layers** | 28/28 injected (all layers compatible) |
145
- | **Precision** | BF16 (full precision — no quantization of base model) |
146
- | **License** | Apache 2.0 (inherited from Qwen2.5) |
147
- | **On-disk size** | ~15.3 GB (BF16 safetensors) |
148
- | **Paper** | [arXiv:2606.24396](https://arxiv.org/abs/2606.24396) — ICLR Workshop 2026 |
149
-
150
- ### Architecture choice
151
-
152
- Qwen2.5-7B-Instruct was chosen for this H-Res implementation because:
153
-
154
- 1. **Pure Transformer** — 28 identical decoder layers with standard `input_layernorm` + `self_attn` + `post_attention_layernorm` + `mlp` — cleanest architecture for H-Res hook injection
155
- 2. **Apache 2.0 license** — no gated access, no approval required, fully open
156
- 3. **Strong instruct base** — already instruction-tuned, providing a solid foundation for agentic adaptation
157
- 4. **7B weight class** — punches above its weight on agent benchmarks while fitting comfortably on A100-40GB
158
-
159
- ---
160
-
161
- ## H-Res Architecture (Deep Dive)
162
-
163
- ### Injection mechanism
164
-
165
- H-Res adapters are injected into each transformer layer via **PyTorch forward hooks** — no monkey-patching of forward methods, no model code modification:
166
-
167
- ```
168
- Layer forward (simplified):
169
- ┌─────────────────────────────────────────────┐
170
- │ residual = hidden_states │
171
- │ normed = input_layernorm(hidden_states) │
172
- │ │
173
- │ attn_out = self_attn(normed) ← frozen │
174
- │ hres_out = hres(normed) ← trained │ ← Hook: captures normed, adds to attn output
175
- │ │
176
- │ hidden_states = residual + attn_out + hres_out │
177
- │ hidden_states = hidden_states + mlp(norm(hidden_states)) │
178
- └─────────────────────────────────────────────┘
179
- ```
180
-
181
- ### Per-layer H-Res parameters
182
-
183
- Each of the 28 layers contains:
184
-
185
- ```
186
- HResAdapter:
187
- W_down: Linear(3584 → 64, bias=False) 228,544 params
188
- W_up: Linear(64 → 3584, bias=False) 228,544 params
189
- scale: scalar (learnable) 1 param
190
- ─────────────────────────────────────────────────────
191
- Total per layer: 457,089 params
192
- Total (28 layers): 12,798,492 params
193
- % of base model (7.6B): 0.17%
194
- ```
195
-
196
- ### Initialization (per paper Section 2.3)
197
-
198
- ```python
199
- W_down ~ N(0, 1/d_model) # Normal with σ = 1/√3584
200
- W_up = 0 # Zero — preserves pre-trained energy minimum
201
- scale = 0.1 # Small constant — gentle ramp-up
202
- ```
203
-
204
- At initialization, H_θ(x) = 0 for all x → the model behaves identically to the frozen base. Training gradually "turns on" the steering field.
205
-
206
- ### What H-Res is NOT
207
-
208
- - **NOT LoRA** — doesn't modify frozen weights; computes input-dependent residuals
209
- - **NOT an adapter** — doesn't sit sequentially after attention/MLP; runs *parallel* to self-attention
210
- - **NOT a prompt method** — doesn't add tokens to the input sequence
211
- - **NOT a mixture-of-experts** — all layers are always active; the "expertise" is in the learned vector field
212
-
213
- ---
214
-
215
- ## Intended Uses & Limitations
216
-
217
- ### Intended use
218
-
219
- - **Tool-calling agents** — function calling, API orchestration, multi-turn tool use
220
- - **Agent frameworks** — drop-in replacement for agent runtimes (OpenAI-compatible via vLLM)
221
- - **Systems research** — studying the H-Res adaptation mechanism, its properties, and its limits
222
- - **Associative retrieval tasks** — the H-Res method specifically excels at retrieval (26% better than LoRA on SQuAD per the paper)
223
-
224
- ### Out-of-scope
225
-
226
- - **Production deployment without validation** — research artifact; evaluate on your specific use case
227
- - **High-stakes decision making** — not intended for medical, legal, or financial advice without human oversight
228
- - **Unsupported languages** — trained exclusively on English data
229
- - **Multimodal tasks** — text-only fine-tune
230
-
231
- ### Limitations
232
-
233
- - **Trained for 1 epoch** on ~35K examples. More data/epochs would improve tool-calling reliability.
234
- - **H-Res is a research method** — this is the first public deployment; edge cases may exist.
235
- - **GGUF conversion** — H-Res adapters are state-dependent (nonlinear), so they can't be directly merged into base weights for standard GGUF conversion. A LoRA-distilled GGUF version is available separately.
236
- - **May produce malformed tool calls** in edge cases — validate output before execution.
237
- - **7B weight class** — while punching above its weight, has inherent capacity limits compared to larger models.
238
-
239
- ---
240
-
241
- ## Training Data
242
-
243
- Six datasets were curated for agentic capability — prioritizing function-calling and tool-use signal over raw instruction volume:
244
-
245
- | Dataset | Type | Samples | Focus |
246
- |---------|------|---------|-------|
247
- | [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | Function calling | 1,893 | Single-turn and multi-turn tool use conversations (MIT) |
248
- | [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | Function calling | 10,000 | Diverse API function calling (sampled from 60K, MIT) |
249
- | [mlabonne/FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) | Instruction following | 20,000 | General instruct/chat data (sampled from 100K, MIT) |
250
- | [Salesforce/APIGen-MT-5k](https://huggingface.co/datasets/Salesforce/APIGen-MT-5k) | API generation | 5,000 | Multi-turn API call generation across diverse APIs (MIT) |
251
- | [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | Function calling | 8,000 | Multi-turn tool-use conversations (MIT) |
252
- | [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) | Tool use | 8,000 | Agentic tool-use conversations (Apache 2.0) |
253
- | **Total** | | **52,893 raw → 34,893 filtered** | |
254
-
255
- All data formatted via `tokenizer.apply_chat_template()` with the Qwen2.5 ChatML template. Examples without a `user` role were filtered. Sequence length capped at 2,048 tokens.
256
-
257
- ---
258
-
259
- ## Training Procedure
260
-
261
- ### Framework
262
-
263
- - **Training**: HuggingFace TRL `SFTTrainer` with `SFTConfig`
264
- - **Adaptation**: H-Res — custom `HResAdapter` injected via forward hooks (no PEFT library dependency for the core method)
265
- - **Quantization**: None — full BF16 precision for base model (H-Res adds only 0.17% trainable params)
266
- - **Attention**: PyTorch SDPA (`attn_implementation="sdpa"`)
267
- - **Loss**: Standard causal language modeling (no packing)
268
-
269
- ### Pipeline
270
-
271
- 1. **Model loading**: BF16 full precision via `AutoModelForCausalLM.from_pretrained()`
272
- 2. **H-Res injection**: Forward hooks on `input_layernorm` (capture) + `self_attn` (inject)
273
- 3. **Base model freeze**: `model.requires_grad_(False)` — only H-Res params trainable
274
- 4. **Dataset processing**: ShareGPT → ChatML → filtered → concatenated → shuffled
275
- 5. **Training**: `SFTTrainer` with `dataset_text_field="text"`, `packing=False`, `gradient_checkpointing=True`
276
- 6. **Export**: `model.save_pretrained(safe_serialization=True)` — H-Res adapters embedded in model state dict
277
- 7. **Upload**: `HfApi.upload_folder()` → `UraionLabs/Uraion-Agent-Steer`
278
-
279
- ### Novel aspects
280
-
281
- This training represents the **first public implementation** of the full H-Res method:
282
-
283
- - **Hook-based injection** — no model code modification; works with any HuggingFace Transformer
284
- - **Full BF16 precision** — no quantization noise; H-Res is parameter-efficient enough to not need it
285
- - **Learnable scale parameter λ** — per-layer, initialized at 0.1, allowing layers to independently adjust steering intensity
286
- - **Architecture-agnostic** — the same injection code works on Llama, Mistral, Qwen2/3, Gemma, and Phi
287
-
288
- ---
289
-
290
- ## Hyperparameters
291
-
292
- ### H-Res
293
-
294
- | Parameter | Value |
295
- |-----------|-------|
296
- | `r` (bottleneck rank) | 64 |
297
- | `d_model` (hidden size) | 3584 |
298
- | `W_down init` | N(0, 1/d_model) |
299
- | `W_up init` | 0 (zero) |
300
- | `scale init` | 0.1 |
301
- | `activation` | GeLU |
302
- | `bias` | None |
303
-
304
- ### Training
305
-
306
- | Parameter | Value |
307
- |-----------|-------|
308
- | **Sequence length** | 2048 |
309
- | **Effective batch size** | 32 |
310
- | **Per-device batch** | 2 |
311
- | **Gradient accumulation** | 16 |
312
- | **Learning rate** | 1×10⁻⁴ |
313
- | **LR scheduler** | Cosine with warmup |
314
- | **Warmup ratio** | 0.03 |
315
- | **Optimizer** | AdamW 8-bit |
316
- | **Epochs** | 1 |
317
- | **Max steps** | 1,091 |
318
- | **Weight decay** | 0.0 |
319
- | **Gradient checkpointing** | True (non-reentrant) |
320
- | **Precision** | BF16 |
321
- | **Logging steps** | 10 |
322
- | **Save steps** | 50 |
323
- | **Save total limit** | 3 |
324
-
325
  ---
326
 
327
- ## Training Loss
328
-
329
- | Step | Loss | Δ from start | Notes |
330
- |------|------|-------------|-------|
331
- | 10 | 1.310 | — | Initial — H-Res scale still ramping |
332
- | 20 | 1.264 | ↓ 3.5% | W_up beginning to activate |
333
- | 50 | 1.013 | ↓ 22.7% | First checkpoint saved; steering field forming |
334
- | 100 | 0.879 | ↓ 32.9% | Rapid convergence phase |
335
- | 200 | 0.741 | ↓ 43.4% | Entering fine-tuning regime |
336
- | 300 | 0.745 | ↓ 43.1% | Stable convergence |
337
- | 400 | 0.699 | ↓ 46.6% | Steady improvement |
338
- | 500 | 0.689 | ↓ 47.4% | Approaching plateau |
339
- | 600 | 0.645 | ↓ 50.8% | Best single-step loss |
340
- | 700 | 0.688 | ↓ 47.5% | Minor oscillation — normal |
341
- | 800 | 0.646 | ↓ 50.7% | Consistent low-loss regime |
342
- | 900 | 0.663 | ↓ 49.4% | Stable |
343
- | 1000 | 0.67 | ↓ 48.9% | Final stretch |
344
- | **1091** | **0.657** | **↓ 49.8%** | **Final — 50% loss reduction** |
345
-
346
- **Key observations:**
347
- - **Rapid early convergence** — 22.7% loss reduction by step 50 (first 4.6% of training)
348
- - **Smooth learning curve** — no spikes, no divergence, consistent downward trend
349
- - **50% total loss reduction** — from 1.310 to 0.657
350
- - **H-Res's zero-initialization advantage** — no "initialization shock" means the model starts from a good place and improves monotonically
351
-
352
- ---
353
 
354
- ## Quickstart
 
355
 
356
- ### Transformers (recommended for full quality)
357
 
358
  ```python
359
- import torch
360
- from transformers import AutoModelForCausalLM, AutoTokenizer
361
-
362
- model_name = "UraionLabs/Uraion-Agent-Steer"
363
-
364
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
365
- model = AutoModelForCausalLM.from_pretrained(
366
- model_name,
367
- torch_dtype=torch.bfloat16,
368
- device_map="auto",
369
- trust_remote_code=True,
370
- )
371
-
372
- # The model includes H-Res adapters — no extra loading needed
373
- messages = [
374
- {"role": "system", "content": "You are Uraion-Agent-Steer, an agent with tool-use capabilities. Use tools when appropriate."},
375
- {"role": "user", "content": "What's the weather in Tokyo? Should I bring an umbrella?"},
376
- ]
377
-
378
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
379
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
380
-
381
- outputs = model.generate(
382
- **inputs,
383
- max_new_tokens=512,
384
- temperature=0.7,
385
- top_p=0.95,
386
- do_sample=True,
387
- )
388
- response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
389
- print(response)
390
- ```
391
-
392
- ### With `pipeline`
393
-
394
- ```python
395
- import torch
396
  from transformers import pipeline
397
 
398
- pipe = pipeline(
399
- "text-generation",
400
- model="UraionLabs/Uraion-Agent-Steer",
401
- torch_dtype=torch.bfloat16,
402
- device_map="auto",
403
- trust_remote_code=True,
404
- )
405
-
406
- messages = [
407
- {"role": "system", "content": "You are a helpful agent with access to tools."},
408
- {"role": "user", "content": "Search for the latest AI research papers on arxiv."},
409
- ]
410
- output = pipe(messages, max_new_tokens=512, temperature=0.7, top_p=0.95)
411
- print(output[0]["generated_text"][-1]["content"] if isinstance(output[0]["generated_text"], list) else output[0]["generated_text"])
412
  ```
413
 
414
- ---
415
-
416
- ## H-Res Adapter Analysis
417
-
418
- After training, we inspected the learned H-Res adapters across all 28 layers:
419
 
420
- | Layer | Scale (λ) | ‖W_up‖ | ‖W_down‖ | Steering activity |
421
- |-------|-----------|--------|----------|-------------------|
422
- | 0 (early) | 0.1001 | 0.0000 | 7.94 | **Silent** — shallow layers don't steer |
423
- | 8 (mid) | 0.1001 | 2.12 | 8.45 | Moderate steering |
424
- | 16 (mid-deep) | 0.1001 | 2.87 | 9.12 | Active steering |
425
- | 24 (deep) | 0.1001 | 3.12 | 9.56 | Strong steering |
426
- | 27 (final) | 0.1001 | **3.72** | **9.69** | **Maximum steering** |
427
 
428
- **Key finding:** Steering intensity increases monotonically with layer depth. Early layers (0–3) have W_up ≈ 0 — the adapter is effectively dormant. Deep layers (20–27) have the strongest steering activity. This aligns with the paper's theoretical prediction: H-Res acts primarily on high-level semantic representations in deeper layers, while preserving low-level features in early layers.
429
 
430
- The scale parameter λ stayed at ~0.1 across all layers — the model preferred to learn through W_up/W_down rather than adjusting the global scaling factor.
431
-
432
- ---
433
 
434
- ## Hardware & Infrastructure
435
 
436
- | Component | Detail |
437
- |-----------|--------|
438
- | **Provisioning** | Google Colab CLI (`colab-cli`) via OAuth2 |
439
- | **GPU** | 1× NVIDIA A100-SXM4-40GB |
440
- | **Runtime** | `colab run --gpu A100 --keep --timeout 28800` |
441
- | **Training time** | ~3 hours (1,091 steps at ~10s/step) |
442
- | **VRAM usage** | ~35 GB (7.6B BF16 base + 12.8M H-Res + activations + optimizer) |
443
- | **Setup** | Self-installing dependencies via pip |
444
- | **Session lifecycle** | `colab run` → auto-execute → `--keep` → training → auto-upload → session release |
445
-
446
- Training dependencies auto-installed on Colab: `transformers>=4.57`, `trl>=0.21`, `datasets`, `accelerate`, `safetensors`, `huggingface_hub`.
447
-
448
- ---
449
-
450
- ## GGUF Availability
451
-
452
- H-Res adapters are **state-dependent** (nonlinear function of the input), so they can't be directly merged into base weights for standard GGUF/llama.cpp conversion. A separate **LoRA-distilled version** is available for GGUF users:
453
-
454
- | Format | Repository | Notes |
455
- |--------|-----------|-------|
456
- | **Safetensors (H-Res)** | `UraionLabs/Uraion-Agent-Steer` | This repo — full quality, original H-Res method |
457
- | **GGUF (LoRA-distilled)** | `UraionLabs/Uraion-Agent-Steer-GGUF` | LoRA trained on same data, merged, quantized to all common variants |
458
-
459
- For maximum quality, use this safetensors release. For local llama.cpp/Ollama/LM Studio inference, use the GGUF release.
460
-
461
- ---
462
 
463
- ## Ethical Considerations
464
-
465
- This model is a fine-tune of Qwen2.5-7B-Instruct and inherits its base capabilities and biases:
466
-
467
- - Training data includes user-generated content from HuggingFace datasets, which may contain biases.
468
- - Function-calling capabilities could automate actions without human oversight — always validate tool calls before execution.
469
- - The model has not undergone safety alignment beyond the base model's existing safeguards.
470
- - The H-Res method is novel — long-term behavior and failure modes are still being studied.
471
- - This is a **research-stage artifact** from Uraion Labs. We are a systems research lab, not a product company. Use accordingly.
472
-
473
- ---
474
 
475
  ## Citations
476
 
477
- ### H-Res (Parallel Manifold Steering)
478
-
479
- ```bibtex
480
- @article{awadhiya2026parallel,
481
- title={Parallel Manifold Steering: Efficient Adaptation of Large
482
- Associative Memories via Residual Energy Shaping},
483
- author={Awadhiya, Kanishk},
484
- journal={ICLR Workshop on New Frontiers in Associative Memory},
485
- year={2026},
486
- url={https://arxiv.org/abs/2606.24396}
487
- }
488
- ```
489
 
490
- ### Uraion-Agent-Steer
491
-
492
- ```bibtex
493
- @software{uraion-agent-steer,
494
- title={Uraion-Agent-Steer: Agentic Model via Hierarchical Residual Steering},
495
- author={Uraion Labs},
496
- year={2026},
497
- url={https://huggingface.co/UraionLabs/Uraion-Agent-Steer}
498
- }
499
- ```
500
-
501
- ### Qwen2.5
502
-
503
- ```bibtex
504
- @misc{qwen2.5,
505
- title={Qwen2.5: A Party of Foundation Models},
506
- author={Qwen Team},
507
- year={2025},
508
- publisher={GitHub},
509
- url={https://github.com/QwenLM/Qwen2.5}
510
- }
511
- ```
512
-
513
- ### TRL
514
 
 
 
515
  ```bibtex
516
  @software{vonwerra2020trl,
517
- title={{TRL: Transformers Reinforcement Learning}},
518
- author={von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and
519
- Beeching, Edward and Thrush, Tristan and Lambert, Nathan and
520
- Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
521
- license={Apache-2.0},
522
- url={https://github.com/huggingface/trl},
523
- year={2020}
524
- }
525
- ```
526
-
527
- ### Datasets
528
-
529
- ```bibtex
530
- @misc{hermesfc,
531
- title={NousResearch Hermes Function Calling},
532
- author={Nous Research},
533
- year={2024},
534
- url={https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1}
535
- }
536
-
537
- @misc{xlam2024,
538
- title={xLAM: A Family of Large Action Models},
539
- author={Salesforce AI Research},
540
- year={2024},
541
- url={https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k}
542
- }
543
-
544
- @misc{finetome2024,
545
- title={FineTome-100k: A Curated Instruction Tuning Dataset},
546
- author={Labonne, Maxime},
547
- year={2024},
548
- url={https://huggingface.co/datasets/mlabonne/FineTome-100k}
549
- }
550
-
551
- @misc{apigen2024,
552
- title={APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets},
553
- author={Salesforce AI Research},
554
- year={2024},
555
- url={https://huggingface.co/datasets/Salesforce/APIGen-MT-5k}
556
  }
557
-
558
- @misc{glaivefc,
559
- title={Glaive Function Calling v2},
560
- author={Glaive AI},
561
- year={2024},
562
- url={https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2}
563
- }
564
-
565
- @misc{toolace2025,
566
- title={ToolACE: Winning the Points of LLM Function Calling},
567
- author={Team ACE},
568
- year={2025},
569
- url={https://huggingface.co/datasets/Team-ACE/ToolACE}
570
- }
571
- ```
572
-
573
- ---
574
-
575
- <p align="center">
576
- <img src="https://uraionlabs.com/public/icons/icon-32.png" alt="" width="24" height="24">
577
- </p>
578
-
579
- <p align="center" style="font-family: 'Inter', sans-serif; font-size: 0.8rem; color: #8A8478;">
580
- <strong style="color: #F7F4ED;">Uraion Labs</strong> — Foundational systems research.
581
- <br>
582
- <a href="https://uraionlabs.com" style="color: #E45A1A;">uraionlabs.com</a>
583
- <br><br>
584
- <em style="color: #6F6A61;">
585
- Intelligence is a systems problem.
586
- </em>
587
- <br>
588
- Licensed under <a href="https://www.apache.org/licenses/LICENSE-2.0" style="color: #E45A1A;">Apache 2.0</a>.
589
- </p>
 
1
  ---
2
  base_model: Qwen/Qwen2.5-7B-Instruct
 
3
  library_name: transformers
4
+ model_name: uraion-agent-steer
 
 
 
5
  tags:
6
+ - generated_from_trainer
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - trl
8
+ - sft
9
+ licence: license
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
+ # Model Card for uraion-agent-steer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
+ ## Quick start
18
 
19
  ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  from transformers import pipeline
21
 
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="None", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
 
 
 
 
 
 
 
 
 
 
26
  ```
27
 
28
+ ## Training procedure
 
 
 
 
29
 
30
+
 
 
 
 
 
 
31
 
 
32
 
 
 
 
33
 
34
+ This model was trained with SFT.
35
 
36
+ ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
+ - TRL: 1.7.0
39
+ - Transformers: 5.12.0
40
+ - Pytorch: 2.11.0+cu128
41
+ - Datasets: 5.0.0
42
+ - Tokenizers: 0.22.2
 
 
 
 
 
 
43
 
44
  ## Citations
45
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
+ Cite TRL as:
49
+
50
  ```bibtex
51
  @software{vonwerra2020trl,
52
+ title = {{TRL: Transformers Reinforcement Learning}},
53
+ author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
54
+ license = {Apache-2.0},
55
+ url = {https://github.com/huggingface/trl},
56
+ year = {2020}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  }
58
+ ```