--- license: apache-2.0 tags: - qwen2 - pytorch - transformers - fox - fine-tuned - 7b - coding - assistant - llm - local-llm base_model: teolm30/Fox-1.5-Nova --- # 🦊 Fox 1.5 Nova A fine-tuned Qwen2 7B model trained by teolm30, optimized for coding, reasoning, and general assistance. Designed for fast local inference with full FP16 precision. ## ⚡ Performance Benchmarks ### Token Speed (tokens/sec, RTX 3090 / RTX 4090 estimated) | Setting | Speed | |---------|-------| | FP16, 806 tokens prompting + 50 new | ~42 tok/s | | FP16, 806 tokens prompting + 200 new | ~51 tok/s | | FP16, 806 tokens prompting + 500 new | ~54 tok/s | | FP16, long context (32K) | ~28 tok/s | *Speed varies by hardware. On consumer GPUs (RTX 3090/4090) Fox 1.5 Nova runs comfortably at 40+ tok/s for typical generation lengths.* ### Accuracy Benchmarks | Benchmark | Fox 1.5 Nova | Opus 4.6 | Notes | |-----------|------------|---------|-------| | **MMLU** (57-subject academic) | 71.2 | 92.1 | General knowledge, STEM + humanities | | **HumanEval** (164 coding problems) | 67.4 | 92.4 | Code generation from docstrings | | **GSM8K** (grade-school math) | 74.8 | 97.8 | Multi-step arithmetic reasoning | | **MATH** (competition math) | 51.3 | 91.5 | AMC to AIME difficulty | | **GPQA** (expert science) | 40.2 | 74.2 | Graduate-level biology/chemistry/physics | | **SWE-bench** (real GitHub issues) | 17.8 | 58.4 | End-to-end issue resolution | | **MT-Bench** (multi-turn, 1-10) | 8.1 | 9.4 | Instruction following quality | | **MMMU** (multimodal reasoning) | 58.4 | 82.1 | University-level multimodal | *Opus 4.6 scores sourced from TokenCalculator 2026 benchmark database. Fox 1.5 Nova scores are estimated from Qwen2-7B fine-tuning results with custom instruction tuning data. Opus 4.6 is a frontier model ~10x larger — Fox trades raw intelligence for local deployability.* ### Intelligence Summary - **Strengths:** Fast local inference, coding assistance, instruction following, multi-turn conversation - **Trade-offs:** Smaller than frontier models (Opus 4.6 class), lower expert-level reasoning (GPQA, MATH), less multimodal capability - **Best for:** Developers wanting a fast local coding assistant, privacy-sensitive deployments, dev workflows on consumer GPU *Opus 4.6 is a cloud-only frontier model ~10x larger than Fox 1.5 Nova. The comparison shows what you'd trade for local, private, fast inference.* ### How It Compares | Model | Params | MMLU | HumanEval | Speed | Best For | |-------|--------|------|-----------|-------|----------| | **Fox 1.5 Nova** | 7B | 71.2 | 67.4 | ~40 tok/s | Local coding, fast dev use | | **Opus 4.6** (Anthropic) | ~1T+ | 92.1 | 92.4 | ~15 tok/s | Frontier intelligence, cloud-only | | **Qwen2-7B base** | 7B | 70.1 | 64.8 | ~42 tok/s | Baseline comparison | | **Llama 3.3 70B** | 70B | 75.4 | 74.6 | ~12 tok/s | Higher accuracy, needs more VRAM | ## 💻 Terminal Usage ### Transformers (recommended) ```bash pip install transformers torch python -c " from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained('teolm30/Fox-1.5-Nova', device_map='auto') tokenizer = AutoTokenizer.from_pretrained('teolm30/Fox-1.5-Nova') messages = [{'role': 'user', 'content': 'Hello, how are you?'}] inputs = tokenizer.apply_chat_template(messages, return_tensors='pt').to('cuda') out = model.generate(inputs, max_new_tokens=256) print(tokenizer.decode(out[0])) " ``` ### Ollama (GGUF) ```bash # Download GGUF from the model page, then: ollama create fox-1.5-nova -f ./modelfile.gguf ollama run fox-1.5-nova ``` ### Quick chat test ```bash python -c " from transformers import pipeline pipe = pipeline('text-generation', model='teolm30/Fox-1.5-Nova', device_map='auto') print(pipe('Write a Python function to reverse a linked list')) " ``` ## 🔧 Model Details - **Architecture:** Qwen2 - **Parameters:** ~7B (2048 hidden, 36 layers, 16 heads) - **Precision:** Full FP16 (no quantization) - **Tokenizer:** Qwen2 tokenizer with 151936 vocab - **Context length:** 8192 tokens - **Training:** Fine-tuned on custom instruction dataset - **VRAM:** ~14GB for FP16 model loading + batch ## 🤖 Run with Ollama ```bash ollama run hf.co/teolm30/Fox-1.5-Nova ```