---
license: apache-2.0
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
  - voice-assistant
  - agent
  - lora
  - ssml
  - tool-calling
  - motivational-interviewing
  - whissle
  - peft
datasets:
  - WhissleAI/whissle-agent-llm-training-data
language:
  - en
  - hi
library_name: peft
pipeline_tag: text-generation
model-index:
  - name: whissle-agent-lora-3b-test
    results:
      - task:
          type: text-generation
          name: Structured Agent Response Generation
        metrics:
          - name: JSON Valid
            type: accuracy
            value: 1.0
          - name: Has SSML
            type: accuracy
            value: 1.0
          - name: Has Prosody/Emotion
            type: accuracy
            value: 0.98
          - name: Has Tool Calls
            type: accuracy
            value: 0.66
          - name: Tool Match vs GT
            type: accuracy
            value: 0.545
          - name: Has MI Codes
            type: accuracy
            value: 1.0
          - name: Voice Appropriate Length
            type: accuracy
            value: 0.96
---

# Whissle Agent LLM — LoRA Adapter for Qwen2.5-3B-Instruct

<a href="https://colab.research.google.com/drive/16JaubplO7r_pnlBJosniTZdmZSHMdR4x?usp=sharing" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A **LoRA adapter** fine-tuned on top of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) for the **Whissle AI voice assistant** pipeline.

This model converts structured ASR perception (transcript + emotion + intent + entities) into structured JSON agent responses with:
- **SSML prosody tags** (emotion, rate, pitch, emphasis, break markers) for TTS
- **Tool calls** (calendar, weather, reminders, search, etc.)
- **Motivational Interviewing (MI) codes** for empathetic, behaviorally-informed responses
- **Turn control** markers for conversation flow management
- **Reasoning** field for chain-of-thought (hidden from user, used for quality)

Built by [Whissle](https://whissle.ai) as part of the [PromptingNemo](https://github.com/WhissleAI/PromptingNemo) framework.

## Try It

**[Open the Colab notebook](https://colab.research.google.com/drive/16JaubplO7r_pnlBJosniTZdmZSHMdR4x?usp=sharing)** to test the model on a free T4 GPU — no local setup needed.

## Evaluation Results — Base vs LoRA Fine-tuned

| Metric | Base (Qwen 2.5 3B) | LoRA Fine-tuned | Delta |
|---|---|---|---|
| **JSON valid** | 96% | 100% | +4% |
| **Has reasoning** | 42% | 96% | +54% |
| **Has SSML** | 48% | 100% | +52% |
| **Has prosody/emotion** | 0% | 98% | +98% |
| **Has break tags** | 8% | 90% | +82% |
| **Has tool calls** | 2% | 66% | +64% |
| **Tool match (vs GT)** | 0.0% | 54.5% | +54.5% |
| **Has MI codes** | 82% | 100% | +18% |
| **Voice appropriate** | 70% | 96% | +26% |
| **Avg response words** | 27.9 | 19.5 | -8.4 (more concise) |
| **Inference time (50 samples)** | 338s | 414s | +22% slower |

## Training Details

| Parameter | Value |
|---|---|
| Base model | `Qwen/Qwen2.5-3B-Instruct` |
| Method | QLoRA (PEFT) |
| LoRA rank (r) | 32 |
| LoRA alpha | 64 |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj` |
| LoRA dropout | 0.05 |
| Epochs | 3 |
| Learning rate | 0.0002 |
| Max sequence length | 2048 |
| Training samples | 5,171 |
| Validation samples | 272 |
| Precision | bf16 |
| Domains | general, finance, sales |
| Hardware | NVIDIA A100 40GB (GKE) |

## How to Use

### Quick Start — LoRA Adapter

```python
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
ADAPTER = "WhissleAI/whissle-agent-lora-3b-test"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

SYSTEM_PROMPT = (
    "You are Lulu, a helpful AI voice assistant by Whissle. "
    "You receive structured perception from the ASR system showing the user's "
    "transcript, emotion, intent, and context. Generate a JSON response with "
    "turn_control, reasoning, response (with SSML prosody tags), tool_calls, "
    "and mi_codes_used. Keep responses under 2 sentences for voice. "
    "Available tools: search_web, set_reminder, check_calendar, send_message, "
    "play_music, get_weather, set_alarm, create_note, make_call, get_directions, "
    "add_to_list, set_timer, translate."
)

perception = {
    "transcript": "What's the weather like in San Francisco?",
    "emotion": "NEUTRAL",
    "speech_act": "QUESTION",
    "generic_intent": "REQUEST",
    "agent_intent": "WEATHER_CHECK",
    "urgency": "LOW",
    "language": "en",
    "domain": "general",
}

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": f"<|perception|>\n{json.dumps(perception, indent=2)}\n<|/perception|>"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True, top_p=0.9)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
agent_response = json.loads(response)
print(json.dumps(agent_response, indent=2))
```

### Merge Adapter Weights (for faster inference)

```python
import torch
from transformers import AutoModelForCausalLM
from peft import PeftModel

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct", torch_dtype=torch.bfloat16, trust_remote_code=True
)
model = PeftModel.from_pretrained(model, "WhissleAI/whissle-agent-lora-3b-test")
model = model.merge_and_unload()
model.save_pretrained("./whissle-agent-merged")
```

### Domain-Specific System Prompts

The model supports three domains, each with a specialized system prompt:

- **General** — Personal assistant (alarms, weather, reminders, search)
- **Finance** — Collections & payments with MI-adherent empathetic handling
- **Sales** — Consultative selling with objection handling

See the [training code](https://github.com/WhissleAI/PromptingNemo/blob/main/recipes/llm_lora/train_agent_lora.py) for the full domain-specific system prompts.

## Response Format

The model outputs structured JSON:

```json
{
  "turn_control": "RESPOND",
  "reasoning": "Simple on/off request. Confirm action.",
  "response": "<prosody emotion='professional' rate='medium'>Turning off the lamp in the living room.</prosody>",
  "mi_codes_used": ["GI"],
  "tool_calls": [
    {
      "tool": "turn_off",
      "parameters": {"device": "living_room_lamp"}
    }
  ]
}
```

### Fields

| Field | Description |
|---|---|
| `turn_control` | `RESPOND`, `SELF`, `YIELD`, `INTERRUPT` — controls conversation flow |
| `reasoning` | Chain-of-thought (hidden from user, used for quality monitoring) |
| `response` | SSML-tagged text for TTS with prosody emotion, rate, pitch, and break markers |
| `tool_calls` | Array of tool invocations with tool name and arguments |
| `mi_codes_used` | Motivational Interviewing behavior codes (AFFIRM, REFLECT, SUPPORT, GI, etc.) |

## Perception Input Format

The model expects ASR perception as a structured JSON block wrapped in `<|perception|>` tags:

```json
{
  "transcript": "user's speech transcript",
  "emotion": "NEUTRAL|HAPPY|SAD|ANGRY|FEARFUL|SURPRISED|DISGUSTED",
  "speech_act": "QUESTION|COMMAND|STATEMENT|GREETING|FAREWELL",
  "generic_intent": "REQUEST|INFORM|CONFIRM|DENY|GREET",
  "agent_intent": "WEATHER_CHECK|ALARM_SET|REMINDER_SET|...",
  "urgency": "LOW|MEDIUM|HIGH|CRITICAL",
  "language": "en|hi|...",
  "domain": "general|finance|sales",
  "entities": [{"entity": "type", "value": "extracted_value"}],
  "mi_behavior": "DIRECT|REFLECT|AFFIRM|..."
}
```

## Limitations

- Inference is ~22% slower than base model due to LoRA adapter overhead (use merged weights for production)
- Tool call accuracy is 54.5% vs ground truth — complex multi-tool scenarios need improvement
- Trained primarily on English and Hinglish; other languages may produce degraded output
- Break tag placement is 90% — some edge cases may have suboptimal pause timing

## Citation

```bibtex
@misc{whissle-agent-lora-2026,
  title={Whissle Agent LLM: LoRA Fine-tuned Qwen2.5-3B for Voice AI Agents},
  author={Whissle AI},
  year={2026},
  url={https://huggingface.co/WhissleAI/whissle-agent-lora-3b-test},
  note={Part of the PromptingNemo framework}
}
```

## License

Apache 2.0