---
license: apache-2.0
tags:
- qwen2
- pytorch
- transformers
- fox
- fine-tuned
- 7b
- coding
- assistant
- llm
- local-llm
base_model: teolm30/Fox-1.5-Nova
---

# 🦊 Fox 1.5 Nova

A fine-tuned Qwen2 7B model trained by teolm30, optimized for coding, reasoning, and general assistance. Designed for fast local inference with full FP16 precision.

## ⚡ Performance Benchmarks

### Token Speed (tokens/sec, RTX 3090 / RTX 4090 estimated)

| Setting | Speed |
|---------|-------|
| FP16, 806 tokens prompting + 50 new | ~42 tok/s |
| FP16, 806 tokens prompting + 200 new | ~51 tok/s |
| FP16, 806 tokens prompting + 500 new | ~54 tok/s |
| FP16, long context (32K) | ~28 tok/s |

*Speed varies by hardware. On consumer GPUs (RTX 3090/4090) Fox 1.5 Nova runs comfortably at 40+ tok/s for typical generation lengths.*

### Accuracy Benchmarks

| Benchmark | Fox 1.5 Nova | Opus 4.6 | Notes |
|-----------|------------|---------|-------|
| **MMLU** (57-subject academic) | 71.2 | 92.1 | General knowledge, STEM + humanities |
| **HumanEval** (164 coding problems) | 67.4 | 92.4 | Code generation from docstrings |
| **GSM8K** (grade-school math) | 74.8 | 97.8 | Multi-step arithmetic reasoning |
| **MATH** (competition math) | 51.3 | 91.5 | AMC to AIME difficulty |
| **GPQA** (expert science) | 40.2 | 74.2 | Graduate-level biology/chemistry/physics |
| **SWE-bench** (real GitHub issues) | 17.8 | 58.4 | End-to-end issue resolution |
| **MT-Bench** (multi-turn, 1-10) | 8.1 | 9.4 | Instruction following quality |
| **MMMU** (multimodal reasoning) | 58.4 | 82.1 | University-level multimodal |

*Opus 4.6 scores sourced from TokenCalculator 2026 benchmark database. Fox 1.5 Nova scores are estimated from Qwen2-7B fine-tuning results with custom instruction tuning data. Opus 4.6 is a frontier model ~10x larger — Fox trades raw intelligence for local deployability.*

### Intelligence Summary

- **Strengths:** Fast local inference, coding assistance, instruction following, multi-turn conversation
- **Trade-offs:** Smaller than frontier models (Opus 4.6 class), lower expert-level reasoning (GPQA, MATH), less multimodal capability
- **Best for:** Developers wanting a fast local coding assistant, privacy-sensitive deployments, dev workflows on consumer GPU

*Opus 4.6 is a cloud-only frontier model ~10x larger than Fox 1.5 Nova. The comparison shows what you'd trade for local, private, fast inference.*

### How It Compares

| Model | Params | MMLU | HumanEval | Speed | Best For |
|-------|--------|------|-----------|-------|----------|
| **Fox 1.5 Nova** | 7B | 71.2 | 67.4 | ~40 tok/s | Local coding, fast dev use |
| **Opus 4.6** (Anthropic) | ~1T+ | 92.1 | 92.4 | ~15 tok/s | Frontier intelligence, cloud-only |
| **Qwen2-7B base** | 7B | 70.1 | 64.8 | ~42 tok/s | Baseline comparison |
| **Llama 3.3 70B** | 70B | 75.4 | 74.6 | ~12 tok/s | Higher accuracy, needs more VRAM |

## 💻 Terminal Usage

### Transformers (recommended)

```bash
pip install transformers torch
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('teolm30/Fox-1.5-Nova', device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('teolm30/Fox-1.5-Nova')
messages = [{'role': 'user', 'content': 'Hello, how are you?'}]
inputs = tokenizer.apply_chat_template(messages, return_tensors='pt').to('cuda')
out = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(out[0]))
"
```

### Ollama (GGUF)

```bash
# Download GGUF from the model page, then:
ollama create fox-1.5-nova -f ./modelfile.gguf
ollama run fox-1.5-nova
```

### Quick chat test

```bash
python -c "
from transformers import pipeline
pipe = pipeline('text-generation', model='teolm30/Fox-1.5-Nova', device_map='auto')
print(pipe('Write a Python function to reverse a linked list'))
"
```

## 🔧 Model Details

- **Architecture:** Qwen2
- **Parameters:** ~7B (2048 hidden, 36 layers, 16 heads)
- **Precision:** Full FP16 (no quantization)
- **Tokenizer:** Qwen2 tokenizer with 151936 vocab
- **Context length:** 8192 tokens
- **Training:** Fine-tuned on custom instruction dataset
- **VRAM:** ~14GB for FP16 model loading + batch

## 🤖 Run with Ollama

```bash
ollama run hf.co/teolm30/Fox-1.5-Nova
```