Instructions to use UraionLabs/Uraion-Agent-Steer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use UraionLabs/Uraion-Agent-Steer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="UraionLabs/Uraion-Agent-Steer")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("UraionLabs/Uraion-Agent-Steer")
model = AutoModelForCausalLM.from_pretrained("UraionLabs/Uraion-Agent-Steer")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use UraionLabs/Uraion-Agent-Steer with PEFT:
```
Task type is invalid.
```

llama-cpp-python

How to use UraionLabs/Uraion-Agent-Steer with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="UraionLabs/Uraion-Agent-Steer",
	filename="Uraion-Agent-Steer-Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use UraionLabs/Uraion-Agent-Steer with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Use Docker

docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M

LM Studio
Jan

vLLM

How to use UraionLabs/Uraion-Agent-Steer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "UraionLabs/Uraion-Agent-Steer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UraionLabs/Uraion-Agent-Steer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M

SGLang

How to use UraionLabs/Uraion-Agent-Steer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "UraionLabs/Uraion-Agent-Steer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UraionLabs/Uraion-Agent-Steer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "UraionLabs/Uraion-Agent-Steer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "UraionLabs/Uraion-Agent-Steer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use UraionLabs/Uraion-Agent-Steer with Ollama:
```
ollama run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
```

Unsloth Studio

How to use UraionLabs/Uraion-Agent-Steer with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UraionLabs/Uraion-Agent-Steer to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for UraionLabs/Uraion-Agent-Steer to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for UraionLabs/Uraion-Agent-Steer to start chatting

How to use UraionLabs/Uraion-Agent-Steer with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "UraionLabs/Uraion-Agent-Steer:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use UraionLabs/Uraion-Agent-Steer with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default UraionLabs/Uraion-Agent-Steer:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use UraionLabs/Uraion-Agent-Steer with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf UraionLabs/Uraion-Agent-Steer:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "UraionLabs/Uraion-Agent-Steer:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use UraionLabs/Uraion-Agent-Steer with Docker Model Runner:
```
docker model run hf.co/UraionLabs/Uraion-Agent-Steer:Q4_K_M
```

Lemonade

How to use UraionLabs/Uraion-Agent-Steer with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull UraionLabs/Uraion-Agent-Steer:Q4_K_M

Run and chat with the model

lemonade run user.Uraion-Agent-Steer-Q4_K_M

List all available models

lemonade list

UraionLabs commited on 4 days ago

Commit

8a89899

verified ·

1 Parent(s): 58d3e12

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +29 -560

README.md CHANGED Viewed

@@ -1,589 +1,58 @@
 ---
 base_model: Qwen/Qwen2.5-7B-Instruct
-base_model_relation: finetune
 library_name: transformers
-license: apache-2.0
-language:
-- en
-pipeline_tag: text-generation
 tags:
-- agent
-- function-calling
-- tool-use
-- h-res
-- manifold-steering
-- peft
-- uraion-labs
-- uraion
-- iclr-2026
-- associative-memory
-- hopfield
-- neural-collapse
-- qwen2.5
-- sft
 - trl
-- hermes-function-calling
-- apigen
-- xlam
-- toolace
-datasets:
-- NousResearch/hermes-function-calling-v1
-- Salesforce/xlam-function-calling-60k
-- mlabonne/FineTome-100k
-- Salesforce/APIGen-MT-5k
-- glaiveai/glaive-function-calling-v2
-- Team-ACE/ToolACE
-inference:
-  parameters:
-    temperature: 0.7
-    top_p: 0.95
-    max_new_tokens: 4096
----
-<p align="center">
-  <picture>
-    <source media="(prefers-color-scheme: dark)" srcset="https://uraionlabs.com/public/icons/icon-192.png">
-    <img src="https://uraionlabs.com/public/icons/icon-192.png" alt="Uraion Labs" width="64" height="64">
-  </picture>
-</p>
-<p align="center">
-  <strong style="font-family: 'Instrument Serif', Georgia, serif; font-size: 2rem; color: #F7F4ED; letter-spacing: -0.02em;">
-    Uraion Labs
-  </strong>
-  <br>
-  <span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">Foundational systems research.</span>
-</p>
-<p align="center">
-  <strong style="font-family: 'Inter', sans-serif; font-size: 1.15rem; color: #E45A1A;">
-    Uraion-Agent-Steer
-  </strong>
-  <br>
-  <span style="font-family: 'Inter', sans-serif; font-size: 0.875rem; color: #8A8478;">
-    Agentic LLM fine-tuned via Hierarchical Residual Steering (H-Res) — steers activations, not weights.
-  </span>
-</p>
----
-**Uraion-Agent-Steer** is a 7-billion parameter model adapted from [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) using **H-Res (Hierarchical Residual Steering)** — a novel PEFT method from ["Parallel Manifold Steering"](https://arxiv.org/abs/2606.24396) (ICLR Workshop 2026). Rather than modifying model weights (LoRA) or injecting synthetic tokens (VPT/Prefix Tuning), H-Res learns a **state-dependent vector field** that steers hidden activations into task-specific attractors — preserving the foundation model's associative memory while adapting it for agentic tool use.
-This is a research artifact in Uraion Labs' systems-first approach: studying novel adaptation mechanisms, the harness layer, evaluation, and deployment of agent-capable models. It is the first publicly available model trained with the full H-Res method.
-**Intelligence is a systems problem.** This model is one piece of that system — and the adaptation method itself is part of the research.
----
-## The H-Res Method
-### The problem with existing PEFT
-| Method | Mechanism | Fatal flaw |
-|--------|-----------|------------|
-| **LoRA** | Modifies weights globally | Catastrophic interference — distorts retrieval dynamics of pre-trained memories |
-| **VPT / Prefix Tuning** | Appends synthetic tokens to input | Buffer congestion — dilutes attention probability mass, weakens associative recall |
-| **H-Res** | Steers activations via vector field | *None of the above* — operates orthogonal to weights and input buffer |
-### How H-Res works
-H-Res frames Transformer adaptation as a **control problem on the activation manifold**. Each layer `l` receives a state-dependent residual:
-```
-z_{l+1} = Attn(z_l) + FFN(z_l) + λ · H_θ(z_l)
-where  H_θ(x) = W_up · GeLU(W_down · x)
-```
-- **W_down ∈ ℝ^{d×r}** — projects to a low-rank "control manifold" (bottleneck)
-- **W_up ∈ ℝ^{r×d}** — projects the steering signal back to activation space
-- **W_up initialized to zero** — no initialization shock; training starts from the pre-trained energy minimum
-- **λ** — learnable per-layer scaling factor
-- **Applied parallel to self-attention** — via forward hooks, orthogonal to the frozen backbone
-### Theoretical guarantees (from the paper)
-| Property | Proof |
-|----------|-------|
-| **Attention entropy preserved** | No synthetic tokens → constant sequence length → H(A_cls) minimal |
-| **Neural Collapse facilitated** | Residual adapter acts as Maxwell's Demon, filtering task-irrelevant noise |
-| **Zero initialization** | W_up = 0 → H_θ(z) = 0 at t=0 → training starts from global energy minimum |
-| **SSM-compatible** | Operates entirely in residual stream — compatible with Mamba, S4, DeltaNet |
-| **Multi-task orthogonality** | Null-Space Projection of gradients across tasks (Eq. 6 in paper) |
----
-## Contents
-- [Model Details](#model-details)
-- [H-Res Architecture (Deep Dive)](#h-res-architecture-deep-dive)
-- [Intended Uses & Limitations](#intended-uses--limitations)
-- [Training Data](#training-data)
-- [Training Procedure](#training-procedure)
-- [Hyperparameters](#hyperparameters)
-- [Training Loss](#training-loss)
-- [Quickstart](#quickstart)
-- [H-Res Adapter Analysis](#h-res-adapter-analysis)
-- [Hardware & Infrastructure](#hardware--infrastructure)
-- [GGUF Availability](#gguf-availability)
-- [Ethical Considerations](#ethical-considerations)
-- [Citations](#citations)
----
-## Model Details
-| Property | Value |
-|----------|-------|
-| **Base model** | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
-| **Architecture** | Qwen2.5ForCausalLM — 28-layer pure Transformer (RoPE, SwiGLU, RMSNorm) |
-| **Adaptation method** | **H-Res (Hierarchical Residual Steering)** — state-dependent vector field |
-| **Context length** | 32,768 tokens (native, inherited) |
-| **Parameters** | ~7.6B total, 12.8M H-Res trainable (0.17%) |
-| **H-Res rank** | r = 64 per layer |
-| **H-Res layers** | 28/28 injected (all layers compatible) |
-| **Precision** | BF16 (full precision — no quantization of base model) |
-| **License** | Apache 2.0 (inherited from Qwen2.5) |
-| **On-disk size** | ~15.3 GB (BF16 safetensors) |
-| **Paper** | [arXiv:2606.24396](https://arxiv.org/abs/2606.24396) — ICLR Workshop 2026 |
-### Architecture choice
-Qwen2.5-7B-Instruct was chosen for this H-Res implementation because:
-1. **Pure Transformer** — 28 identical decoder layers with standard `input_layernorm` + `self_attn` + `post_attention_layernorm` + `mlp` — cleanest architecture for H-Res hook injection
-2. **Apache 2.0 license** — no gated access, no approval required, fully open
-3. **Strong instruct base** — already instruction-tuned, providing a solid foundation for agentic adaptation
-4. **7B weight class** — punches above its weight on agent benchmarks while fitting comfortably on A100-40GB
----
-## H-Res Architecture (Deep Dive)
-### Injection mechanism
-H-Res adapters are injected into each transformer layer via **PyTorch forward hooks** — no monkey-patching of forward methods, no model code modification:
-```
-Layer forward (simplified):
-  ┌─────────────────────────────────────────────┐
-  │ residual = hidden_states                     │
-  │ normed = input_layernorm(hidden_states)      │
-  │                                              │
-  │ attn_out = self_attn(normed)     ← frozen   │
-  │ hres_out = hres(normed)          ← trained  │  ← Hook: captures normed, adds to attn output
-  │                                              │
-  │ hidden_states = residual + attn_out + hres_out │
-  │ hidden_states = hidden_states + mlp(norm(hidden_states)) │
-  └─────────────────────────────────────────────┘
-```
-### Per-layer H-Res parameters
-Each of the 28 layers contains:
-```
-HResAdapter:
-  W_down: Linear(3584 → 64, bias=False)   228,544 params
-  W_up:   Linear(64 → 3584, bias=False)   228,544 params
-  scale:  scalar (learnable)                    1 param
-  ─────────────────────────────────────────────────────
-  Total per layer:                        457,089 params
-  Total (28 layers):                   12,798,492 params
-  % of base model (7.6B):                    0.17%
-```
-### Initialization (per paper Section 2.3)
-```python
-W_down ~ N(0, 1/d_model)     # Normal with σ = 1/√3584
-W_up   = 0                    # Zero — preserves pre-trained energy minimum
-scale  = 0.1                  # Small constant — gentle ramp-up
-```
-At initialization, H_θ(x) = 0 for all x → the model behaves identically to the frozen base. Training gradually "turns on" the steering field.
-### What H-Res is NOT
-- **NOT LoRA** — doesn't modify frozen weights; computes input-dependent residuals
-- **NOT an adapter** — doesn't sit sequentially after attention/MLP; runs *parallel* to self-attention
-- **NOT a prompt method** — doesn't add tokens to the input sequence
-- **NOT a mixture-of-experts** — all layers are always active; the "expertise" is in the learned vector field
----
-## Intended Uses & Limitations
-### Intended use
-- **Tool-calling agents** — function calling, API orchestration, multi-turn tool use
-- **Agent frameworks** — drop-in replacement for agent runtimes (OpenAI-compatible via vLLM)
-- **Systems research** — studying the H-Res adaptation mechanism, its properties, and its limits
-- **Associative retrieval tasks** — the H-Res method specifically excels at retrieval (26% better than LoRA on SQuAD per the paper)
-### Out-of-scope
-- **Production deployment without validation** — research artifact; evaluate on your specific use case
-- **High-stakes decision making** — not intended for medical, legal, or financial advice without human oversight
-- **Unsupported languages** — trained exclusively on English data
-- **Multimodal tasks** — text-only fine-tune
-### Limitations
-- **Trained for 1 epoch** on ~35K examples. More data/epochs would improve tool-calling reliability.
-- **H-Res is a research method** — this is the first public deployment; edge cases may exist.
-- **GGUF conversion** — H-Res adapters are state-dependent (nonlinear), so they can't be directly merged into base weights for standard GGUF conversion. A LoRA-distilled GGUF version is available separately.
-- **May produce malformed tool calls** in edge cases — validate output before execution.
-- **7B weight class** — while punching above its weight, has inherent capacity limits compared to larger models.
----
-## Training Data
-Six datasets were curated for agentic capability — prioritizing function-calling and tool-use signal over raw instruction volume:
-| Dataset | Type | Samples | Focus |
-|---------|------|---------|-------|
-| [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1) | Function calling | 1,893 | Single-turn and multi-turn tool use conversations (MIT) |
-| [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | Function calling | 10,000 | Diverse API function calling (sampled from 60K, MIT) |
-| [mlabonne/FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) | Instruction following | 20,000 | General instruct/chat data (sampled from 100K, MIT) |
-| [Salesforce/APIGen-MT-5k](https://huggingface.co/datasets/Salesforce/APIGen-MT-5k) | API generation | 5,000 | Multi-turn API call generation across diverse APIs (MIT) |
-| [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | Function calling | 8,000 | Multi-turn tool-use conversations (MIT) |
-| [Team-ACE/ToolACE](https://huggingface.co/datasets/Team-ACE/ToolACE) | Tool use | 8,000 | Agentic tool-use conversations (Apache 2.0) |
-| **Total** | | **52,893 raw → 34,893 filtered** | |
-All data formatted via `tokenizer.apply_chat_template()` with the Qwen2.5 ChatML template. Examples without a `user` role were filtered. Sequence length capped at 2,048 tokens.
----
-## Training Procedure
-### Framework
-- **Training**: HuggingFace TRL `SFTTrainer` with `SFTConfig`
-- **Adaptation**: H-Res — custom `HResAdapter` injected via forward hooks (no PEFT library dependency for the core method)
-- **Quantization**: None — full BF16 precision for base model (H-Res adds only 0.17% trainable params)
-- **Attention**: PyTorch SDPA (`attn_implementation="sdpa"`)
-- **Loss**: Standard causal language modeling (no packing)
-### Pipeline
-1. **Model loading**: BF16 full precision via `AutoModelForCausalLM.from_pretrained()`
-2. **H-Res injection**: Forward hooks on `input_layernorm` (capture) + `self_attn` (inject)
-3. **Base model freeze**: `model.requires_grad_(False)` — only H-Res params trainable
-4. **Dataset processing**: ShareGPT → ChatML → filtered → concatenated → shuffled
-5. **Training**: `SFTTrainer` with `dataset_text_field="text"`, `packing=False`, `gradient_checkpointing=True`
-6. **Export**: `model.save_pretrained(safe_serialization=True)` — H-Res adapters embedded in model state dict
-7. **Upload**: `HfApi.upload_folder()` → `UraionLabs/Uraion-Agent-Steer`
-### Novel aspects
-This training represents the **first public implementation** of the full H-Res method:
-- **Hook-based injection** — no model code modification; works with any HuggingFace Transformer
-- **Full BF16 precision** — no quantization noise; H-Res is parameter-efficient enough to not need it
-- **Learnable scale parameter λ** — per-layer, initialized at 0.1, allowing layers to independently adjust steering intensity
-- **Architecture-agnostic** — the same injection code works on Llama, Mistral, Qwen2/3, Gemma, and Phi
----
-## Hyperparameters
-### H-Res
-| Parameter | Value |
-|-----------|-------|
-| `r` (bottleneck rank) | 64 |
-| `d_model` (hidden size) | 3584 |
-| `W_down init` | N(0, 1/d_model) |
-| `W_up init` | 0 (zero) |
-| `scale init` | 0.1 |
-| `activation` | GeLU |
-| `bias` | None |
-### Training
-| Parameter | Value |
-|-----------|-------|
-| **Sequence length** | 2048 |
-| **Effective batch size** | 32 |
-| **Per-device batch** | 2 |
-| **Gradient accumulation** | 16 |
-| **Learning rate** | 1×10⁻⁴ |
-| **LR scheduler** | Cosine with warmup |
-| **Warmup ratio** | 0.03 |
-| **Optimizer** | AdamW 8-bit |
-| **Epochs** | 1 |
-| **Max steps** | 1,091 |
-| **Weight decay** | 0.0 |
-| **Gradient checkpointing** | True (non-reentrant) |
-| **Precision** | BF16 |
-| **Logging steps** | 10 |
-| **Save steps** | 50 |
-| **Save total limit** | 3 |
 ---
-## Training Loss
-| Step | Loss | Δ from start | Notes |
-|------|------|-------------|-------|
-| 10 | 1.310 | — | Initial — H-Res scale still ramping |
-| 20 | 1.264 | ↓ 3.5% | W_up beginning to activate |
-| 50 | 1.013 | ↓ 22.7% | First checkpoint saved; steering field forming |
-| 100 | 0.879 | ↓ 32.9% | Rapid convergence phase |
-| 200 | 0.741 | ↓ 43.4% | Entering fine-tuning regime |
-| 300 | 0.745 | ↓ 43.1% | Stable convergence |
-| 400 | 0.699 | ↓ 46.6% | Steady improvement |
-| 500 | 0.689 | ↓ 47.4% | Approaching plateau |
-| 600 | 0.645 | ↓ 50.8% | Best single-step loss |
-| 700 | 0.688 | ↓ 47.5% | Minor oscillation — normal |
-| 800 | 0.646 | ↓ 50.7% | Consistent low-loss regime |
-| 900 | 0.663 | ↓ 49.4% | Stable |
-| 1000 | 0.67 | ↓ 48.9% | Final stretch |
-| **1091** | **0.657** | **↓ 49.8%** | **Final — 50% loss reduction** |
-**Key observations:**
-- **Rapid early convergence** — 22.7% loss reduction by step 50 (first 4.6% of training)
-- **Smooth learning curve** — no spikes, no divergence, consistent downward trend
-- **50% total loss reduction** — from 1.310 to 0.657
-- **H-Res's zero-initialization advantage** — no "initialization shock" means the model starts from a good place and improves monotonically
----
-## Quickstart
-### Transformers (recommended for full quality)
 ```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "UraionLabs/Uraion-Agent-Steer"
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-    trust_remote_code=True,
-)
-# The model includes H-Res adapters — no extra loading needed
-messages = [
-    {"role": "system", "content": "You are Uraion-Agent-Steer, an agent with tool-use capabilities. Use tools when appropriate."},
-    {"role": "user", "content": "What's the weather in Tokyo? Should I bring an umbrella?"},
-]
-text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(text, return_tensors="pt").to(model.device)
-outputs = model.generate(
-    **inputs,
-    max_new_tokens=512,
-    temperature=0.7,
-    top_p=0.95,
-    do_sample=True,
-)
-response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
-print(response)
-```
-### With `pipeline`
-```python
-import torch
 from transformers import pipeline
-pipe = pipeline(
-    "text-generation",
-    model="UraionLabs/Uraion-Agent-Steer",
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-    trust_remote_code=True,
-)
-messages = [
-    {"role": "system", "content": "You are a helpful agent with access to tools."},
-    {"role": "user", "content": "Search for the latest AI research papers on arxiv."},
-]
-output = pipe(messages, max_new_tokens=512, temperature=0.7, top_p=0.95)
-print(output[0]["generated_text"][-1]["content"] if isinstance(output[0]["generated_text"], list) else output[0]["generated_text"])
 ```
----
-## H-Res Adapter Analysis
-After training, we inspected the learned H-Res adapters across all 28 layers:
-| Layer | Scale (λ) | ‖W_up‖ | ‖W_down‖ | Steering activity |
-|-------|-----------|--------|----------|-------------------|
-| 0 (early) | 0.1001 | 0.0000 | 7.94 | **Silent** — shallow layers don't steer |
-| 8 (mid) | 0.1001 | 2.12 | 8.45 | Moderate steering |
-| 16 (mid-deep) | 0.1001 | 2.87 | 9.12 | Active steering |
-| 24 (deep) | 0.1001 | 3.12 | 9.56 | Strong steering |
-| 27 (final) | 0.1001 | **3.72** | **9.69** | **Maximum steering** |
-**Key finding:** Steering intensity increases monotonically with layer depth. Early layers (0–3) have W_up ≈ 0 — the adapter is effectively dormant. Deep layers (20–27) have the strongest steering activity. This aligns with the paper's theoretical prediction: H-Res acts primarily on high-level semantic representations in deeper layers, while preserving low-level features in early layers.
-The scale parameter λ stayed at ~0.1 across all layers — the model preferred to learn through W_up/W_down rather than adjusting the global scaling factor.
----
-## Hardware & Infrastructure
-| Component | Detail |
-|-----------|--------|
-| **Provisioning** | Google Colab CLI (`colab-cli`) via OAuth2 |
-| **GPU** | 1× NVIDIA A100-SXM4-40GB |
-| **Runtime** | `colab run --gpu A100 --keep --timeout 28800` |
-| **Training time** | ~3 hours (1,091 steps at ~10s/step) |
-| **VRAM usage** | ~35 GB (7.6B BF16 base + 12.8M H-Res + activations + optimizer) |
-| **Setup** | Self-installing dependencies via pip |
-| **Session lifecycle** | `colab run` → auto-execute → `--keep` → training → auto-upload → session release |
-Training dependencies auto-installed on Colab: `transformers>=4.57`, `trl>=0.21`, `datasets`, `accelerate`, `safetensors`, `huggingface_hub`.
----
-## GGUF Availability
-H-Res adapters are **state-dependent** (nonlinear function of the input), so they can't be directly merged into base weights for standard GGUF/llama.cpp conversion. A separate **LoRA-distilled version** is available for GGUF users:
-| Format | Repository | Notes |
-|--------|-----------|-------|
-| **Safetensors (H-Res)** | `UraionLabs/Uraion-Agent-Steer` | This repo — full quality, original H-Res method |
-| **GGUF (LoRA-distilled)** | `UraionLabs/Uraion-Agent-Steer-GGUF` | LoRA trained on same data, merged, quantized to all common variants |
-For maximum quality, use this safetensors release. For local llama.cpp/Ollama/LM Studio inference, use the GGUF release.
----
-## Ethical Considerations
-This model is a fine-tune of Qwen2.5-7B-Instruct and inherits its base capabilities and biases:
-- Training data includes user-generated content from HuggingFace datasets, which may contain biases.
-- Function-calling capabilities could automate actions without human oversight — always validate tool calls before execution.
-- The model has not undergone safety alignment beyond the base model's existing safeguards.
-- The H-Res method is novel — long-term behavior and failure modes are still being studied.
-- This is a **research-stage artifact** from Uraion Labs. We are a systems research lab, not a product company. Use accordingly.
----
 ## Citations
-### H-Res (Parallel Manifold Steering)
-```bibtex
-@article{awadhiya2026parallel,
-  title={Parallel Manifold Steering: Efficient Adaptation of Large
-         Associative Memories via Residual Energy Shaping},
-  author={Awadhiya, Kanishk},
-  journal={ICLR Workshop on New Frontiers in Associative Memory},
-  year={2026},
-  url={https://arxiv.org/abs/2606.24396}
-}
-```
-### Uraion-Agent-Steer
-```bibtex
-@software{uraion-agent-steer,
-  title={Uraion-Agent-Steer: Agentic Model via Hierarchical Residual Steering},
-  author={Uraion Labs},
-  year={2026},
-  url={https://huggingface.co/UraionLabs/Uraion-Agent-Steer}
-}
-```
-### Qwen2.5
-```bibtex
-@misc{qwen2.5,
-  title={Qwen2.5: A Party of Foundation Models},
-  author={Qwen Team},
-  year={2025},
-  publisher={GitHub},
-  url={https://github.com/QwenLM/Qwen2.5}
-}
-```
-### TRL
 ```bibtex
 @software{vonwerra2020trl,
-  title={{TRL: Transformers Reinforcement Learning}},
-  author={von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and
-          Beeching, Edward and Thrush, Tristan and Lambert, Nathan and
-          Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
-  license={Apache-2.0},
-  url={https://github.com/huggingface/trl},
-  year={2020}
-}
-```
-### Datasets
-```bibtex
-@misc{hermesfc,
-  title={NousResearch Hermes Function Calling},
-  author={Nous Research},
-  year={2024},
-  url={https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1}
-}
-@misc{xlam2024,
-  title={xLAM: A Family of Large Action Models},
-  author={Salesforce AI Research},
-  year={2024},
-  url={https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k}
-}
-@misc{finetome2024,
-  title={FineTome-100k: A Curated Instruction Tuning Dataset},
-  author={Labonne, Maxime},
-  year={2024},
-  url={https://huggingface.co/datasets/mlabonne/FineTome-100k}
-}
-@misc{apigen2024,
-  title={APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets},
-  author={Salesforce AI Research},
-  year={2024},
-  url={https://huggingface.co/datasets/Salesforce/APIGen-MT-5k}
 }
-@misc{glaivefc,
-  title={Glaive Function Calling v2},
-  author={Glaive AI},
-  year={2024},
-  url={https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2}
-}
-@misc{toolace2025,
-  title={ToolACE: Winning the Points of LLM Function Calling},
-  author={Team ACE},
-  year={2025},
-  url={https://huggingface.co/datasets/Team-ACE/ToolACE}
-}
-```
----
-<p align="center">
-  <img src="https://uraionlabs.com/public/icons/icon-32.png" alt="" width="24" height="24">
-</p>
-<p align="center" style="font-family: 'Inter', sans-serif; font-size: 0.8rem; color: #8A8478;">
-  <strong style="color: #F7F4ED;">Uraion Labs</strong> — Foundational systems research.
-  <br>
-  <a href="https://uraionlabs.com" style="color: #E45A1A;">uraionlabs.com</a>
-  <br><br>
-  <em style="color: #6F6A61;">
-    Intelligence is a systems problem.
-  </em>
-  <br>
-  Licensed under <a href="https://www.apache.org/licenses/LICENSE-2.0" style="color: #E45A1A;">Apache 2.0</a>.
-</p>

 ---
 base_model: Qwen/Qwen2.5-7B-Instruct
 library_name: transformers
+model_name: uraion-agent-steer
 tags:
+- generated_from_trainer
 - trl
+- sft
+licence: license
 ---
+# Model Card for uraion-agent-steer
+This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
 ```python
 from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="None", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
 ```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- TRL: 1.7.0
+- Transformers: 5.12.0
+- Pytorch: 2.11.0+cu128
+- Datasets: 5.0.0
+- Tokenizers: 0.22.2
 ## Citations
+Cite TRL as:
 ```bibtex
 @software{vonwerra2020trl,
+  title   = {{TRL: Transformers Reinforcement Learning}},
+  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
+  license = {Apache-2.0},
+  url     = {https://github.com/huggingface/trl},
+  year    = {2020}
 }
+```