Instructions to use narcolepticchicken/legal-agent-micro-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use narcolepticchicken/legal-agent-micro-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="narcolepticchicken/legal-agent-micro-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("narcolepticchicken/legal-agent-micro-v2", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use narcolepticchicken/legal-agent-micro-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "narcolepticchicken/legal-agent-micro-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "narcolepticchicken/legal-agent-micro-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/narcolepticchicken/legal-agent-micro-v2

SGLang

How to use narcolepticchicken/legal-agent-micro-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "narcolepticchicken/legal-agent-micro-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "narcolepticchicken/legal-agent-micro-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "narcolepticchicken/legal-agent-micro-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "narcolepticchicken/legal-agent-micro-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use narcolepticchicken/legal-agent-micro-v2 with Docker Model Runner:
```
docker model run hf.co/narcolepticchicken/legal-agent-micro-v2
```

Legal-Agent Micro-Model v2

A trace-trained Qwen3-0.6B + LoRA classifier fine-tuned on 2,750 synthetic legal-agent execution traces for first-pass routing and classification across 5 legal-agent tasks.

Tasks

Task	Description	Labels
escalation	Should this matter escalate to a senior attorney?	`ESCALATE`, `NO_ESCALATE`
tool_routing	Which legal research tool should handle this request?	`statute_lookup`, `case_search`, `clause_extractor`, `citation_validator`, `contract_comparator`, `docket_checker`, `jurisdiction_mapper`
answer_check	Is this legal answer source-grounded or hallucinated?	`GROUNDED`, `HALLUCINATED`
playbook	Which contract playbook category applies?	`NDA`, `M&A`, `Employment`, `IP License`, `SaaS Agreement`, `Settlement`, `Loan Agreement`, `Commercial Lease`, `Insurance Policy`, `Compliance Filing`
memory_safety	Is this memory entry safe to write?	`SAFE_TO_WRITE`, `BLOCKED`

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    torch_dtype=torch.bfloat16,
    attn_implementation="kernels-community/flash-attn2",
    device_map="auto",
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "narcolepticchicken/legal-agent-micro-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Classify a legal trace
messages = [
    {"role": "user", "content": "# Task: Classify whether this legal matter requires escalation...\n\n## User Request\nClient received notice of regulatory investigation by SEC.\n\n## Context\n{\"jurisdiction\": \"Federal, USA\", \"matter\": \"SEC Investigation\"}\n\n## Agent Plan\n1. Assess\n2. Check escalation policy\n\n## Intermediate Answer\nSEC Section 21(a) inquiry triggers mandatory escalation per policy 4.2(a).\n\nBased on the above trace, what is the correct classification?"},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=20, do_sample=False, pad_token_id=tokenizer.pad_token_id)

result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)  # "ESCALATE"

Routing Policy

See routing_policy.py for the full tiered routing logic:

Tier 1: Micro-Model (this model) — first-pass classification, ~50ms, ~$0.002/call
Tier 2: SOTA Fallback (Qwen3-8B) — for low-confidence cases (~10-25%)
Tier 3: Verifier loop — re-run for safety-critical decisions (memory_safety, escalation)

Training Details

Parameter	Value
Base Model	Qwen3-0.6B (751M params, Apache 2.0)
Method	LoRA SFT (r=64, alpha=128, target=all-linear)
Dataset	2,750 synthetic legal-agent traces (v2)
Train/Val	2,337 / 413
Epochs	3
Learning Rate	3e-4 (cosine schedule, warmup 5%)
Effective Batch	16 (4 per device × 4 accumulation)
Precision	bf16, flash-attn2 (Hub kernel)
Loss	Assistant-only cross-entropy on conversational format
Training Time	~16 min on A10G-large
Final Eval Loss	7.4e-06
Final Eval Accuracy	100% (token)

Dataset

Synthetic traces available at:

narcolepticchicken/legal-agent-traces-v2 (2,750 traces, 50% adversarial/hard)
narcolepticchicken/legalbench-transfer-eval (16 LegalBench-style transfer samples)

Limitations

Trained on synthetic data only — may not generalize to real legal scenarios
751M params — not suitable for complex legal reasoning; classification/routing only
English-only legal domain (primarily US/UK jurisdictions)
The model outputs classification labels from trace context — it does not execute tool calls directly

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for narcolepticchicken/legal-agent-micro-v2

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Adapter

(411)

this model