Legal-Agent Micro-Model v2

A trace-trained Qwen3-0.6B + LoRA classifier fine-tuned on 2,750 synthetic legal-agent execution traces for first-pass routing and classification across 5 legal-agent tasks.

Tasks

Task Description Labels
escalation Should this matter escalate to a senior attorney? ESCALATE, NO_ESCALATE
tool_routing Which legal research tool should handle this request? statute_lookup, case_search, clause_extractor, citation_validator, contract_comparator, docket_checker, jurisdiction_mapper
answer_check Is this legal answer source-grounded or hallucinated? GROUNDED, HALLUCINATED
playbook Which contract playbook category applies? NDA, M&A, Employment, IP License, SaaS Agreement, Settlement, Loan Agreement, Commercial Lease, Insurance Policy, Compliance Filing
memory_safety Is this memory entry safe to write? SAFE_TO_WRITE, BLOCKED

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    torch_dtype=torch.bfloat16,
    attn_implementation="kernels-community/flash-attn2",
    device_map="auto",
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "narcolepticchicken/legal-agent-micro-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Classify a legal trace
messages = [
    {"role": "user", "content": "# Task: Classify whether this legal matter requires escalation...\n\n## User Request\nClient received notice of regulatory investigation by SEC.\n\n## Context\n{\"jurisdiction\": \"Federal, USA\", \"matter\": \"SEC Investigation\"}\n\n## Agent Plan\n1. Assess\n2. Check escalation policy\n\n## Intermediate Answer\nSEC Section 21(a) inquiry triggers mandatory escalation per policy 4.2(a).\n\nBased on the above trace, what is the correct classification?"},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=20, do_sample=False, pad_token_id=tokenizer.pad_token_id)

result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)  # "ESCALATE"

Routing Policy

See routing_policy.py for the full tiered routing logic:

  • Tier 1: Micro-Model (this model) โ€” first-pass classification, ~50ms, ~$0.002/call
  • Tier 2: SOTA Fallback (Qwen3-8B) โ€” for low-confidence cases (~10-25%)
  • Tier 3: Verifier loop โ€” re-run for safety-critical decisions (memory_safety, escalation)

Training Details

Parameter Value
Base Model Qwen3-0.6B (751M params, Apache 2.0)
Method LoRA SFT (r=64, alpha=128, target=all-linear)
Dataset 2,750 synthetic legal-agent traces (v2)
Train/Val 2,337 / 413
Epochs 3
Learning Rate 3e-4 (cosine schedule, warmup 5%)
Effective Batch 16 (4 per device ร— 4 accumulation)
Precision bf16, flash-attn2 (Hub kernel)
Loss Assistant-only cross-entropy on conversational format
Training Time ~16 min on A10G-large
Final Eval Loss 7.4e-06
Final Eval Accuracy 100% (token)

Dataset

Synthetic traces available at:

Limitations

  • Trained on synthetic data only โ€” may not generalize to real legal scenarios
  • 751M params โ€” not suitable for complex legal reasoning; classification/routing only
  • English-only legal domain (primarily US/UK jurisdictions)
  • The model outputs classification labels from trace context โ€” it does not execute tool calls directly
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for narcolepticchicken/legal-agent-micro-v2

Finetuned
Qwen/Qwen3-0.6B
Adapter
(411)
this model