Instructions to use UraionLabs/Uraion-Agent-Steer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use UraionLabs/Uraion-Agent-Steer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="UraionLabs/Uraion-Agent-Steer")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("UraionLabs/Uraion-Agent-Steer")
model = AutoModelForCausalLM.from_pretrained("UraionLabs/Uraion-Agent-Steer")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use UraionLabs/Uraion-Agent-Steer with PEFT:
```
Task type is invalid.
```

llama-cpp-python

How to use UraionLabs/Uraion-Agent-Steer with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="UraionLabs/Uraion-Agent-Steer",
	filename="Uraion-Agent-Steer-Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use UraionLabs/Uraion-Agent-Steer with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Use Docker

docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M

LM Studio
Jan

vLLM

How to use UraionLabs/Uraion-Agent-Steer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "UraionLabs/Uraion-Agent-Steer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UraionLabs/Uraion-Agent-Steer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M

SGLang

How to use UraionLabs/Uraion-Agent-Steer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "UraionLabs/Uraion-Agent-Steer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UraionLabs/Uraion-Agent-Steer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "UraionLabs/Uraion-Agent-Steer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UraionLabs/Uraion-Agent-Steer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use UraionLabs/Uraion-Agent-Steer with Ollama:
```
ollama run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
```

Unsloth Studio

How to use UraionLabs/Uraion-Agent-Steer with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UraionLabs/Uraion-Agent-Steer to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UraionLabs/Uraion-Agent-Steer to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for UraionLabs/Uraion-Agent-Steer to start chatting

How to use UraionLabs/Uraion-Agent-Steer with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "UraionLabs/Uraion-Agent-Steer:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use UraionLabs/Uraion-Agent-Steer with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default UraionLabs/Uraion-Agent-Steer:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use UraionLabs/Uraion-Agent-Steer with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "UraionLabs/Uraion-Agent-Steer:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use UraionLabs/Uraion-Agent-Steer with Docker Model Runner:
```
docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
```

Lemonade

How to use UraionLabs/Uraion-Agent-Steer with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull UraionLabs/Uraion-Agent-Steer:Q4_K_M

Run and chat with the model

lemonade run user.Uraion-Agent-Steer-Q4_K_M

List all available models

lemonade list

UraionLabs commited on 5 days ago

Commit

58d3e12

verified ·

1 Parent(s): 43cbf22

docs: in-depth model card — H-Res architecture, training details, adapter analysis, citations

Browse files

Files changed (1) hide show

README.md +560 -29

README.md CHANGED Viewed

@@ -1,58 +1,589 @@
 ---
 base_model: Qwen/Qwen2.5-7B-Instruct
 library_name: transformers
-model_name: uraion-agent-steer
 tags:
-- generated_from_trainer
-- trl
 - sft
-licence: license
 ---
-# Model Card for uraion-agent-steer
-This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
 ```python
 from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
 ```
-## Training procedure
-This model was trained with SFT.
-### Framework versions
-- TRL: 1.7.0
-- Transformers: 5.12.0
-- Pytorch: 2.11.0+cu128
-- Datasets: 5.0.0
-- Tokenizers: 0.22.2
 ## Citations
-Cite TRL as:
 ```bibtex
 @software{vonwerra2020trl,
-  title   = {{TRL: Transformers Reinforcement Learning}},
-  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
-  license = {Apache-2.0},
-  url     = {https://github.com/huggingface/trl},
-  year    = {2020}
 }
-```

 ---
 base_model: Qwen/Qwen2.5-7B-Instruct
+base_model_relation: finetune
 library_name: transformers
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
 tags:
+- agent
+- function-calling
+- tool-use
+- h-res
+- manifold-steering
+- peft
+- uraion-labs
+- uraion
+- iclr-2026
+- associative-memory
+- hopfield
+- neural-collapse
+- qwen2.5
 - sft
+- trl
+- hermes-function-calling
+- apigen
+- xlam
+- toolace
+datasets:
+- NousResearch/hermes-function-calling-v1
+- Salesforce/xlam-function-calling-60k
+- mlabonne/FineTome-100k
+- Salesforce/APIGen-MT-5k
+- glaiveai/glaive-function-calling-v2
+- Team-ACE/ToolACE
+inference:
+  parameters:
+    temperature: 0.7
+    top_p: 0.95
+    max_new_tokens: 4096
+---
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="https://uraionlabs.com/public/icons/icon-192.png">
+    <img src="https://uraionlabs.com/public/icons/icon-192.png" alt="Uraion Labs" width="64" height="64">
+  </picture>
+</p>
+<p align="center">
+  <strong style="font-family: 'Instrument Serif', Georgia, serif; font-size: 2rem; color: #F7F4ED; letter-spacing: -0.02em;">
+    Uraion Labs
+  </strong>
+  <br>
+  <span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">Foundational systems research.</span>
+</p>
+<p align="center">
+  <strong style="font-family: 'Inter', sans-serif; font-size: 1.15rem; color: #E45A1A;">
+    Uraion-Agent-Steer
+  </strong>
+  <br>
+  <span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">
+    Agentic LLM fine-tuned via Hierarchical Residual Steering (H-Res) — steers activations, not weights.
+  </span>
+</p>
+---
+**Uraion-Agent-Steer** is a 7-billion parameter model adapted from [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) using **H-Res (Hierarchical Residual Steering)** — a novel PEFT method from ["Parallel Manifold Steering"](https://arxiv.org/abs/2606.24396) (ICLR Workshop 2026). Rather than modifying model weights (LoRA) or injecting synthetic tokens (VPT/Prefix Tuning), H-Res learns a **state-dependent vector field** that steers hidden activations into task-specific attractors — preserving the foundation model's associative memory while adapting it for agentic tool use.
+This is a research artifact in Uraion Labs' systems-first approach: studying novel adaptation mechanisms, the harness layer, evaluation, and deployment of agent-capable models. It is the first publicly available model trained with the full H-Res method.
+**Intelligence is a systems problem.** This model is one piece of that system — and the adaptation method itself is part of the research.
 ---
+## The H-Res Method
+### The problem with existing PEFT
+| Method | Mechanism | Fatal flaw |
+|--------|-----------|------------|
+| **LoRA** | Modifies weights globally | Catastrophic interference — distorts retrieval dynamics of pre-trained memories |
+| **VPT / Prefix Tuning** | Appends synthetic tokens to input | Buffer congestion — dilutes attention probability mass, weakens associative recall |
+| **H-Res** | Steers activations via vector field | *None of the above* — operates orthogonal to weights and input buffer |
+### How H-Res works
+H-Res frames Transformer adaptation as a **control problem on the activation manifold**. Each layer `l` receives a state-dependent residual:
+```
+z_{l+1} = Attn(z_l) + FFN(z_l) + λ · H_θ(z_l)
+where  H_θ(x) = W_up · GeLU(W_down · x)
+```
+- **W_down ∈ ℝ^{d×r}** — projects to a low-rank "control manifold" (bottleneck)
+- **W_up ∈ ℝ^{r×d}** — projects the steering signal back to activation space
+- **W_up initialized to zero** — no initialization shock; training starts from the pre-trained energy minimum
+- **λ** — learnable per-layer scaling factor
+- **Applied parallel to self-attention** — via forward hooks, orthogonal to the frozen backbone
+### Theoretical guarantees (from the paper)
+| Property | Proof |
+|----------|-------|
+| **Attention entropy preserved** | No synthetic tokens → constant sequence length → H(A_cls) minimal |
+| **Neural Collapse facilitated** | Residual adapter acts as Maxwell's Demon, filtering task-irrelevant noise |
+| **Zero initialization** | W_up = 0 → H_θ(z) = 0 at t=0 → training starts from global energy minimum |
+| **SSM-compatible** | Operates entirely in residual stream — compatible with Mamba, S4, DeltaNet |
+| **Multi-task orthogonality** | Null-Space Projection of gradients across tasks (Eq. 6 in paper) |
+---
+## Contents
+- [Model Details](#model-details)
+- [H-Res Architecture (Deep Dive)](#h-res-architecture-deep-dive)
+- [Intended Uses & Limitations](#intended-uses--limitations)
+- [Training Data](#training-data)
+- [Training Procedure](#training-procedure)
+- [Hyperparameters](#hyperparameters)
+- [Training Loss](#training-loss)
+- [Quickstart](#quickstart)
+- [H-Res Adapter Analysis](#h-res-adapter-analysis)
+- [Hardware & Infrastructure](#hardware--infrastructure)
+- [GGUF Availability](#gguf-availability)
+- [Ethical Considerations](#ethical-considerations)
+- [Citations](#citations)
+---
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
+| **Architecture** | Qwen2.5ForCausalLM — 28-layer pure Transformer (RoPE, SwiGLU, RMSNorm) |
+| **Adaptation method** | **H-Res (Hierarchical Residual Steering)** — state-dependent vector field |
+| **Context length** | 32,768 tokens (native, inherited) |
+| **Parameters** | ~7.6B total, 12.8M H-Res trainable (0.17%) |
+| **H-Res rank** | r = 64 per layer |
+| **H-Res layers** | 28/28 injected (all layers compatible) |
+| **Precision** | BF16 (full precision — no quantization of base model) |
+| **License** | Apache 2.0 (inherited from Qwen2.5) |
+| **On-disk size** | ~15.3 GB (BF16 safetensors) |
+| **Paper** | [arXiv:2606.24396](https://arxiv.org/abs/2606.24396) — ICLR Workshop 2026 |
+### Architecture choice
+Qwen2.5-7B-Instruct was chosen for this H-Res implementation because:
+1. **Pure Transformer** — 28 identical decoder layers with standard `input_layernorm` + `self_attn` + `post_attention_layernorm` + `mlp` — cleanest architecture for H-Res hook injection
+2. **Apache 2.0 license** — no gated access, no approval required, fully open
+3. **Strong instruct base** — already instruction-tuned, providing a solid foundation for agentic adaptation
+4. **7B weight class** — punches above its weight on agent benchmarks while fitting comfortably on A100-40GB
+---
+## H-Res Architecture (Deep Dive)
+### Injection mechanism
+H-Res adapters are injected into each transformer layer via **PyTorch forward hooks** — no monkey-patching of forward methods, no model code modification:
+```
+Layer forward (simplified):
+  ┌─────────────────────────────────────────────┐
+  │ residual = hidden_states                     │
+  │ normed = input_layernorm(hidden_states)      │
+  │                                              │
+  │ attn_out = self_attn(normed)     ← frozen   │
+  │ hres_out = hres(normed)          ← trained  │  ← Hook: captures normed, adds to attn output
+  │                                              │
+  │ hidden_states = residual + attn_out + hres_out │
+  │ hidden_states = hidden_states + mlp(norm(hidden_states)) │
+  └─────────────────────────────────────────────┘
+```
+### Per-layer H-Res parameters
+Each of the 28 layers contains:
+```
+HResAdapter:
+  W_down: Linear(3584 → 64, bias=False)   228,544 params
+  W_up:   Linear(64 → 3584, bias=False)   228,544 params
+  scale:  scalar (learnable)                    1 param
+  ─────────────────────────────────────────────────────
+  Total per layer:                        457,089 params
+  Total (28 layers):                   12,798,492 params
+  % of base model (7.6B):                    0.17%
+```
+### Initialization (per paper Section 2.3)
 ```python
+W_down ~ N(0, 1/d_model)     # Normal with σ = 1/√3584
+W_up   = 0                    # Zero — preserves pre-trained energy minimum
+scale  = 0.1                  # Small constant — gentle ramp-up
+```
+At initialization, H_θ(x) = 0 for all x → the model behaves identically to the frozen base. Training gradually "turns on" the steering field.
+### What H-Res is NOT
+- **NOT LoRA** — doesn't modify frozen weights; computes input-dependent residuals
+- **NOT an adapter** — doesn't sit sequentially after attention/MLP; runs *parallel* to self-attention
+- **NOT a prompt method** — doesn't add tokens to the input sequence
+- **NOT a mixture-of-experts** — all layers are always active; the "expertise" is in the learned vector field
+---
+## Intended Uses & Limitations
+### Intended use
+- **Tool-calling agents** — function calling, API orchestration, multi-turn tool use
+- **Agent frameworks** — drop-in replacement for agent runtimes (OpenAI-compatible via vLLM)
+- **Systems research** — studying the H-Res adaptation mechanism, its properties, and its limits
+- **Associative retrieval tasks** — the H-Res method specifically excels at retrieval (26% better than LoRA on SQuAD per the paper)
+### Out-of-scope
+- **Production deployment without validation** — research artifact; evaluate on your specific use case
+- **High-stakes decision making** — not intended for medical, legal, or financial advice without human oversight
+- **Unsupported languages** — trained exclusively on English data
+- **Multimodal tasks** — text-only fine-tune
+### Limitations
+- **Trained for 1 epoch** on ~35K examples. More data/epochs would improve tool-calling reliability.
+- **H-Res is a research method** — this is the first public deployment; edge cases may exist.
+- **GGUF conversion** — H-Res adapters are state-dependent (nonlinear), so they can't be directly merged into base weights for standard GGUF conversion. A LoRA-distilled GGUF version is available separately.
+- **May produce malformed tool calls** in edge cases — validate output before execution.
+- **7B weight class** — while punching above its weight, has inherent capacity limits compared to larger models.
+---
+## Training Data
+Six datasets were curated for agentic capability — prioritizing function-calling and tool-use signal over raw instruction volume:
+| Dataset | Type | Samples | Focus |
+|---------|------|---------|-------|
+| [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | Function calling | 1,893 | Single-turn and multi-turn tool use conversations (MIT) |
+| [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | Function calling | 10,000 | Diverse API function calling (sampled from 60K, MIT) |
+| [mlabonne/FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) | Instruction following | 20,000 | General instruct/chat data (sampled from 100K, MIT) |
+| [Salesforce/APIGen-MT-5k](https://huggingface.co/datasets/Salesforce/APIGen-MT-5k) | API generation | 5,000 | Multi-turn API call generation across diverse APIs (MIT) |
+| [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | Function calling | 8,000 | Multi-turn tool-use conversations (MIT) |
+| [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) | Tool use | 8,000 | Agentic tool-use conversations (Apache 2.0) |
+| **Total** | | **52,893 raw → 34,893 filtered** | |
+All data formatted via `tokenizer.apply_chat_template()` with the Qwen2.5 ChatML template. Examples without a `user` role were filtered. Sequence length capped at 2,048 tokens.
+---
+## Training Procedure
+### Framework
+- **Training**: HuggingFace TRL `SFTTrainer` with `SFTConfig`
+- **Adaptation**: H-Res — custom `HResAdapter` injected via forward hooks (no PEFT library dependency for the core method)
+- **Quantization**: None — full BF16 precision for base model (H-Res adds only 0.17% trainable params)
+- **Attention**: PyTorch SDPA (`attn_implementation="sdpa"`)
+- **Loss**: Standard causal language modeling (no packing)
+### Pipeline
+1. **Model loading**: BF16 full precision via `AutoModelForCausalLM.from_pretrained()`
+2. **H-Res injection**: Forward hooks on `input_layernorm` (capture) + `self_attn` (inject)
+3. **Base model freeze**: `model.requires_grad_(False)` — only H-Res params trainable
+4. **Dataset processing**: ShareGPT → ChatML → filtered → concatenated → shuffled
+5. **Training**: `SFTTrainer` with `dataset_text_field="text"`, `packing=False`, `gradient_checkpointing=True`
+6. **Export**: `model.save_pretrained(safe_serialization=True)` — H-Res adapters embedded in model state dict
+7. **Upload**: `HfApi.upload_folder()` → `UraionLabs/Uraion-Agent-Steer`
+### Novel aspects
+This training represents the **first public implementation** of the full H-Res method:
+- **Hook-based injection** — no model code modification; works with any HuggingFace Transformer
+- **Full BF16 precision** — no quantization noise; H-Res is parameter-efficient enough to not need it
+- **Learnable scale parameter λ** — per-layer, initialized at 0.1, allowing layers to independently adjust steering intensity
+- **Architecture-agnostic** — the same injection code works on Llama, Mistral, Qwen2/3, Gemma, and Phi
+---
+## Hyperparameters
+### H-Res
+| Parameter | Value |
+|-----------|-------|
+| `r` (bottleneck rank) | 64 |
+| `d_model` (hidden size) | 3584 |
+| `W_down init` | N(0, 1/d_model) |
+| `W_up init` | 0 (zero) |
+| `scale init` | 0.1 |
+| `activation` | GeLU |
+| `bias` | None |
+### Training
+| Parameter | Value |
+|-----------|-------|
+| **Sequence length** | 2048 |
+| **Effective batch size** | 32 |
+| **Per-device batch** | 2 |
+| **Gradient accumulation** | 16 |
+| **Learning rate** | 1×10⁻⁴ |
+| **LR scheduler** | Cosine with warmup |
+| **Warmup ratio** | 0.03 |
+| **Optimizer** | AdamW 8-bit |
+| **Epochs** | 1 |
+| **Max steps** | 1,091 |
+| **Weight decay** | 0.0 |
+| **Gradient checkpointing** | True (non-reentrant) |
+| **Precision** | BF16 |
+| **Logging steps** | 10 |
+| **Save steps** | 50 |
+| **Save total limit** | 3 |
+---
+## Training Loss
+| Step | Loss | Δ from start | Notes |
+|------|------|-------------|-------|
+| 10 | 1.310 | — | Initial — H-Res scale still ramping |
+| 20 | 1.264 | ↓ 3.5% | W_up beginning to activate |
+| 50 | 1.013 | ↓ 22.7% | First checkpoint saved; steering field forming |
+| 100 | 0.879 | ↓ 32.9% | Rapid convergence phase |
+| 200 | 0.741 | ↓ 43.4% | Entering fine-tuning regime |
+| 300 | 0.745 | ↓ 43.1% | Stable convergence |
+| 400 | 0.699 | ↓ 46.6% | Steady improvement |
+| 500 | 0.689 | ↓ 47.4% | Approaching plateau |
+| 600 | 0.645 | ↓ 50.8% | Best single-step loss |
+| 700 | 0.688 | ↓ 47.5% | Minor oscillation — normal |
+| 800 | 0.646 | ↓ 50.7% | Consistent low-loss regime |
+| 900 | 0.663 | ↓ 49.4% | Stable |
+| 1000 | 0.67 | ↓ 48.9% | Final stretch |
+| **1091** | **0.657** | **↓ 49.8%** | **Final — 50% loss reduction** |
+**Key observations:**
+- **Rapid early convergence** — 22.7% loss reduction by step 50 (first 4.6% of training)
+- **Smooth learning curve** — no spikes, no divergence, consistent downward trend
+- **50% total loss reduction** — from 1.310 to 0.657
+- **H-Res's zero-initialization advantage** — no "initialization shock" means the model starts from a good place and improves monotonically
+---
+## Quickstart
+### Transformers (recommended for full quality)
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "UraionLabs/Uraion-Agent-Steer"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+# The model includes H-Res adapters — no extra loading needed
+messages = [
+    {"role": "system", "content": "You are Uraion-Agent-Steer, an agent with tool-use capabilities. Use tools when appropriate."},
+    {"role": "user", "content": "What's the weather in Tokyo? Should I bring an umbrella?"},
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512,
+    temperature=0.7,
+    top_p=0.95,
+    do_sample=True,
+)
+response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+print(response)
+```
+### With `pipeline`
+```python
+import torch
 from transformers import pipeline
+pipe = pipeline(
+    "text-generation",
+    model="UraionLabs/Uraion-Agent-Steer",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+messages = [
+    {"role": "system", "content": "You are a helpful agent with access to tools."},
+    {"role": "user", "content": "Search for the latest AI research papers on arxiv."},
+]
+output = pipe(messages, max_new_tokens=512, temperature=0.7, top_p=0.95)
+print(output[0]["generated_text"][-1]["content"] if isinstance(output[0]["generated_text"], list) else output[0]["generated_text"])
 ```
+---
+## H-Res Adapter Analysis
+After training, we inspected the learned H-Res adapters across all 28 layers:
+| Layer | Scale (λ) | ‖W_up‖ | ‖W_down‖ | Steering activity |
+|-------|-----------|--------|----------|-------------------|
+| 0 (early) | 0.1001 | 0.0000 | 7.94 | **Silent** — shallow layers don't steer |
+| 8 (mid) | 0.1001 | 2.12 | 8.45 | Moderate steering |
+| 16 (mid-deep) | 0.1001 | 2.87 | 9.12 | Active steering |
+| 24 (deep) | 0.1001 | 3.12 | 9.56 | Strong steering |
+| 27 (final) | 0.1001 | **3.72** | **9.69** | **Maximum steering** |
+**Key finding:** Steering intensity increases monotonically with layer depth. Early layers (0–3) have W_up ≈ 0 — the adapter is effectively dormant. Deep layers (20–27) have the strongest steering activity. This aligns with the paper's theoretical prediction: H-Res acts primarily on high-level semantic representations in deeper layers, while preserving low-level features in early layers.
+The scale parameter λ stayed at ~0.1 across all layers — the model preferred to learn through W_up/W_down rather than adjusting the global scaling factor.
+---
+## Hardware & Infrastructure
+| Component | Detail |
+|-----------|--------|
+| **Provisioning** | Google Colab CLI (`colab-cli`) via OAuth2 |
+| **GPU** | 1× NVIDIA A100-SXM4-40GB |
+| **Runtime** | `colab run --gpu A100 --keep --timeout 28800` |
+| **Training time** | ~3 hours (1,091 steps at ~10s/step) |
+| **VRAM usage** | ~35 GB (7.6B BF16 base + 12.8M H-Res + activations + optimizer) |
+| **Setup** | Self-installing dependencies via pip |
+| **Session lifecycle** | `colab run` → auto-execute → `--keep` → training → auto-upload → session release |
+Training dependencies auto-installed on Colab: `transformers>=4.57`, `trl>=0.21`, `datasets`, `accelerate`, `safetensors`, `huggingface_hub`.
+---
+## GGUF Availability
+H-Res adapters are **state-dependent** (nonlinear function of the input), so they can't be directly merged into base weights for standard GGUF/llama.cpp conversion. A separate **LoRA-distilled version** is available for GGUF users:
+| Format | Repository | Notes |
+|--------|-----------|-------|
+| **Safetensors (H-Res)** | `UraionLabs/Uraion-Agent-Steer` | This repo — full quality, original H-Res method |
+| **GGUF (LoRA-distilled)** | `UraionLabs/Uraion-Agent-Steer-GGUF` | LoRA trained on same data, merged, quantized to all common variants |
+For maximum quality, use this safetensors release. For local llama.cpp/Ollama/LM Studio inference, use the GGUF release.
+---
+## Ethical Considerations
+This model is a fine-tune of Qwen2.5-7B-Instruct and inherits its base capabilities and biases:
+- Training data includes user-generated content from HuggingFace datasets, which may contain biases.
+- Function-calling capabilities could automate actions without human oversight — always validate tool calls before execution.
+- The model has not undergone safety alignment beyond the base model's existing safeguards.
+- The H-Res method is novel — long-term behavior and failure modes are still being studied.
+- This is a **research-stage artifact** from Uraion Labs. We are a systems research lab, not a product company. Use accordingly.
+---
 ## Citations
+### H-Res (Parallel Manifold Steering)
+```bibtex
+@article{awadhiya2026parallel,
+  title={Parallel Manifold Steering: Efficient Adaptation of Large
+         Associative Memories via Residual Energy Shaping},
+  author={Awadhiya, Kanishk},
+  journal={ICLR Workshop on New Frontiers in Associative Memory},
+  year={2026},
+  url={https://arxiv.org/abs/2606.24396}
+}
+```
+### Uraion-Agent-Steer
+```bibtex
+@software{uraion-agent-steer,
+  title={Uraion-Agent-Steer: Agentic Model via Hierarchical Residual Steering},
+  author={Uraion Labs},
+  year={2026},
+  url={https://huggingface.co/UraionLabs/Uraion-Agent-Steer}
+}
+```
+### Qwen2.5
+```bibtex
+@misc{qwen2.5,
+  title={Qwen2.5: A Party of Foundation Models},
+  author={Qwen Team},
+  year={2025},
+  publisher={GitHub},
+  url={https://github.com/QwenLM/Qwen2.5}
+}
+```
+### TRL
 ```bibtex
 @software{vonwerra2020trl,
+  title={{TRL: Transformers Reinforcement Learning}},
+  author={von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and
+          Beeching, Edward and Thrush, Tristan and Lambert, Nathan and
+          Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
+  license={Apache-2.0},
+  url={https://github.com/huggingface/trl},
+  year={2020}
+}
+```
+### Datasets
+```bibtex
+@misc{hermesfc,
+  title={NousResearch Hermes Function Calling},
+  author={Nous Research},
+  year={2024},
+  url={https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1}
+}
+@misc{xlam2024,
+  title={xLAM: A Family of Large Action Models},
+  author={Salesforce AI Research},
+  year={2024},
+  url={https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k}
+}
+@misc{finetome2024,
+  title={FineTome-100k: A Curated Instruction Tuning Dataset},
+  author={Labonne, Maxime},
+  year={2024},
+  url={https://huggingface.co/datasets/mlabonne/FineTome-100k}
+}
+@misc{apigen2024,
+  title={APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets},
+  author={Salesforce AI Research},
+  year={2024},
+  url={https://huggingface.co/datasets/Salesforce/APIGen-MT-5k}
 }
+@misc{glaivefc,
+  title={Glaive Function Calling v2},
+  author={Glaive AI},
+  year={2024},
+  url={https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2}
+}
+@misc{toolace2025,
+  title={ToolACE: Winning the Points of LLM Function Calling},
+  author={Team ACE},
+  year={2025},
+  url={https://huggingface.co/datasets/Team-ACE/ToolACE}
+}
+```
+---
+<p align="center">
+  <img src="https://uraionlabs.com/public/icons/icon-32.png" alt="" width="24" height="24">
+</p>
+<p align="center" style="font-family: 'Inter', sans-serif; font-size: 0.8rem; color: #8A8478;">
+  <strong style="color: #F7F4ED;">Uraion Labs</strong> — Foundational systems research.
+  <br>
+  <a href="https://uraionlabs.com" style="color: #E45A1A;">uraionlabs.com</a>
+  <br><br>
+  <em style="color: #6F6A61;">
+    Intelligence is a systems problem.
+  </em>
+  <br>
+  Licensed under <a href="https://www.apache.org/licenses/LICENSE-2.0" style="color: #E45A1A;">Apache 2.0</a>.
+</p>