---
language:
- en
license: mit
tags:
- intent-classification
- transformer
- virtual-assistant
- nlp
- voice-assistant
- offline-ai
- edge-deployment
metrics:
- accuracy
---

# JaneGPT v2 — Intent Classification Model

A lightweight, fast, and accurate intent classification model 
built from scratch for virtual assistant command understanding.

**7.8M parameters | 22 intent classes | 88.6% validation 
accuracy | ~17ms inference on GPU**

![Loss Curves](assets/janegpt_combined_loss_curves.png)

---

## Why I Built This

I'm building JANE — a fully offline, privacy-first AI voice 
assistant. Llama 3 8B was causing 10–22 second delays for 
simple commands like "turn up the volume."

That's not a voice assistant. That's a waiting game.

So I designed JaneGPT v2 from scratch — a model that does 
exactly one job, does it fast, and runs on consumer hardware 
without any cloud dependency.

---

## Model Details

| Property | Value |
|---|---|
| Architecture | Decoder-only Transformer + Classification Head |
| Parameters | ~7.8M |
| Embedding dim | 256 |
| Attention heads | 8 |
| KV heads (GQA) | 4 |
| Layers | 8 |
| FF hidden dim | 672 |
| Max sequence length | 256 |
| Vocab size | 8,192 |
| Tokenizer | Custom BPE |
| Training accuracy | ~96.7% |
| Validation accuracy | 88.6% |
| Checkpoint size | ~30MB |

---

## Architecture Decisions & Why

| Choice | Reason |
|---|---|
| **GQA** (4 KV heads, 8 attention heads) | Reduces memory without losing expressiveness |
| **RoPE** positional encoding | Better length generalization than learned embeddings |
| **SwiGLU** activation | Smoother gradients than ReLU at this model size |
| **RMSNorm** | Simpler and faster than LayerNorm |
| **Custom BPE tokenizer** | Trained specifically on command-style text |

---

## Supported Intents (22 classes)

| Category | Intents |
|---|---|
| Volume | `volume_up`, `volume_down`, `volume_set`, `volume_mute` |
| Brightness | `brightness_up`, `brightness_down`, `brightness_set` |
| Media | `media_play`, `media_pause`, `media_next`, `media_previous` |
| Apps | `app_launch`, `app_close`, `app_switch` |
| Browser | `browser_search` |
| Productivity | `set_reminder`, `screenshot` |
| Screen | `read_screen`, `explain_screen` |
| Control | `undo`, `quit_jane` |
| Conversation | `chat` |

---

## Performance

| Input | Predicted Intent | Confidence |
|---|---|---|
| "increase the volume" | volume_up | 86% |
| "make it louder" | volume_up | 90% |
| "turn down the brightness" | brightness_down | 80% |
| "open chrome" | app_launch | 98% |
| "play some music" | media_play | 96% |
| "search for cats on youtube" | browser_search | 94% |
| "set a reminder for 5 minutes" | set_reminder | 96% |
| "take a screenshot" | screenshot | 88% |
| "undo that" | undo | 92% |
| "hello" | chat | 97% |

---

## Quick Start

### Installation
```python
git clone https://huggingface.co/RavinduSen/JaneGPT-v2
cd JaneGPT-v2
pip install -r requirements.txt
```

### Basic Usage
```python
from classifier import JaneGPTClassifier

classifier = JaneGPTClassifier()

intent, confidence = classifier.predict("turn up the volume")
print(f"Intent: {intent}, Confidence: {confidence:.2%}")
# Output: Intent: volume_up, Confidence: 86.10%

intent, confidence = classifier.predict("open chrome")
print(f"Intent: {intent}, Confidence: {confidence:.2%}")
# Output: Intent: app_launch, Confidence: 98.10%
```

### With Conversation Context
```python
intent, confidence = classifier.predict(
    "not enough",
    context={"last_intent": "volume_up"}
)
# Output: Intent: volume_up, Confidence: 79.00%
```

---

## Training Setup

| Component | Details |
|---|---|
| Hardware | NVIDIA RTX 3050Ti (4GB VRAM) |
| CPU | AMD Ryzen 9 5900HX |
| RAM | 16GB |
| Additional | Google Colab (extended training runs) |
| Framework | PyTorch 2.0+ |
| Training data | Custom command dataset (claude assisted generation under author supervision) |

---

## Limitations

- Intent classification only — does not generate text
- 22 classes — commands outside supported set classified as `chat`
- English only
- Optimized for short inputs (1–15 words)
- No entity extraction — returns intent label only

---

## Use Cases

- Virtual assistant command routing
- Smart home intent classification
- Voice command understanding
- Chatbot intent detection
- Edge device deployment (small enough for embedded systems)

---

## Part of the JANE Project

This model is the intelligence core of **JANE** — a fully 
offline, privacy-first AI voice assistant.

🔗 [JANE AI Assistant on GitHub](https://github.com/Ravindu-S/JANE-AI-Assistant)
🔗 [JaneGPT-v2 on GitHub](https://github.com/Ravindu-S/JaneGPT-v2)

---

## Created By

**Ravindu Senanayake** — Computer Science Undergraduate, Sri Lanka

Built from scratch — architecture, tokenizer, and training 
pipeline designed and implemented by the author.

[![GitHub](https://img.shields.io/badge/GitHub-Ravindu--S-black?logo=github)](https://github.com/Ravindu-S)