Text Generation
Transformers
Safetensors
legal
agent
classification
routing
lora
sft
synthetic-traces
micro-model
conversational
Instructions to use narcolepticchicken/legal-agent-micro-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use narcolepticchicken/legal-agent-micro-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="narcolepticchicken/legal-agent-micro-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("narcolepticchicken/legal-agent-micro-v2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use narcolepticchicken/legal-agent-micro-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "narcolepticchicken/legal-agent-micro-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "narcolepticchicken/legal-agent-micro-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/narcolepticchicken/legal-agent-micro-v2
- SGLang
How to use narcolepticchicken/legal-agent-micro-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "narcolepticchicken/legal-agent-micro-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "narcolepticchicken/legal-agent-micro-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "narcolepticchicken/legal-agent-micro-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "narcolepticchicken/legal-agent-micro-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use narcolepticchicken/legal-agent-micro-v2 with Docker Model Runner:
docker model run hf.co/narcolepticchicken/legal-agent-micro-v2
Legal-Agent Micro-Model v2
A trace-trained Qwen3-0.6B + LoRA classifier fine-tuned on 2,750 synthetic legal-agent execution traces for first-pass routing and classification across 5 legal-agent tasks.
Tasks
| Task | Description | Labels |
|---|---|---|
| escalation | Should this matter escalate to a senior attorney? | ESCALATE, NO_ESCALATE |
| tool_routing | Which legal research tool should handle this request? | statute_lookup, case_search, clause_extractor, citation_validator, contract_comparator, docket_checker, jurisdiction_mapper |
| answer_check | Is this legal answer source-grounded or hallucinated? | GROUNDED, HALLUCINATED |
| playbook | Which contract playbook category applies? | NDA, M&A, Employment, IP License, SaaS Agreement, Settlement, Loan Agreement, Commercial Lease, Insurance Policy, Compliance Filing |
| memory_safety | Is this memory entry safe to write? | SAFE_TO_WRITE, BLOCKED |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-0.6B",
torch_dtype=torch.bfloat16,
attn_implementation="kernels-community/flash-attn2",
device_map="auto",
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "narcolepticchicken/legal-agent-micro-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Classify a legal trace
messages = [
{"role": "user", "content": "# Task: Classify whether this legal matter requires escalation...\n\n## User Request\nClient received notice of regulatory investigation by SEC.\n\n## Context\n{\"jurisdiction\": \"Federal, USA\", \"matter\": \"SEC Investigation\"}\n\n## Agent Plan\n1. Assess\n2. Check escalation policy\n\n## Intermediate Answer\nSEC Section 21(a) inquiry triggers mandatory escalation per policy 4.2(a).\n\nBased on the above trace, what is the correct classification?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=20, do_sample=False, pad_token_id=tokenizer.pad_token_id)
result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result) # "ESCALATE"
Routing Policy
See routing_policy.py for the full tiered routing logic:
- Tier 1: Micro-Model (this model) โ first-pass classification, ~50ms, ~$0.002/call
- Tier 2: SOTA Fallback (Qwen3-8B) โ for low-confidence cases (~10-25%)
- Tier 3: Verifier loop โ re-run for safety-critical decisions (memory_safety, escalation)
Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen3-0.6B (751M params, Apache 2.0) |
| Method | LoRA SFT (r=64, alpha=128, target=all-linear) |
| Dataset | 2,750 synthetic legal-agent traces (v2) |
| Train/Val | 2,337 / 413 |
| Epochs | 3 |
| Learning Rate | 3e-4 (cosine schedule, warmup 5%) |
| Effective Batch | 16 (4 per device ร 4 accumulation) |
| Precision | bf16, flash-attn2 (Hub kernel) |
| Loss | Assistant-only cross-entropy on conversational format |
| Training Time | ~16 min on A10G-large |
| Final Eval Loss | 7.4e-06 |
| Final Eval Accuracy | 100% (token) |
Dataset
Synthetic traces available at:
- narcolepticchicken/legal-agent-traces-v2 (2,750 traces, 50% adversarial/hard)
- narcolepticchicken/legalbench-transfer-eval (16 LegalBench-style transfer samples)
Limitations
- Trained on synthetic data only โ may not generalize to real legal scenarios
- 751M params โ not suitable for complex legal reasoning; classification/routing only
- English-only legal domain (primarily US/UK jurisdictions)
- The model outputs classification labels from trace context โ it does not execute tool calls directly