Vikras-MixP / README.md
srs6901's picture
Upload README.md
9f618d1 verified
|
Raw
History Blame
1.83 kB
---
library_name: transformers
tags:
- quantized
- hybrid
language:
- ru
- en
---
# Vikra MixedPrc
12.25B parameter Mistral-based language model with mixed-precision hybrid quantization.
## Model Details
| Property | Value |
|---|---|
| **Architecture** | Mistral (12.25B params, 40 layers) |
| **Hidden size** | 5120 |
| **Attention heads** | 32 (8 KV heads, GQA) |
| **Intermediate size** | 14336 |
| **Context length** | 1,024,000 tokens |
| **Vocabulary** | 131,072 tokens (Tekken BPE) |
| **RoPE theta** | 1,000,000.0 |
## Quantization: MixP_4.9b_S
Custom mixed-precision quantization scheme with per-tensor type assignment.
| Tensor group | Quant type | BPW |
|---|---|---|
| `token_embd`, `output` | BF16 | 16.00 |
| `attn_norm`, `ffn_norm`, `output_norm` | F32 | 32.00 |
| `attn_q` | Q4_K | 4.50 |
| `attn_k` | Q5_K | 5.50 |
| `attn_v` | Q3_K | 3.44 |
| `attn_output` | Q4_K | 4.50 |
| `ffn_gate` | Q3_K | 3.44 |
| `ffn_up` | Q5_K | 5.50 |
| `ffn_down` | Q5_K / Q6_K (last layers) | 5.50–6.56 |
**Overall: 6.11 BPW | Quantized layers only: 4.89 BPW | File size: 8.71 GB**
## Perplexity
Measured on **wikitext-2-raw-test** (full dataset, 73 chunks, context 4096 tokens, 299,008 tokens evaluated):
| Model | Precision | Size | PPL |
|---|---|---|---|
| **Vikra MixP_4.9b_S** | **6.11 BPW** | **8.71 GB** | **5.5000 ± 0.032** |
| Vikhr-Nemo-12B-Instruct (baseline) | BF16 | 22.81 GB | 6.0212 ± 0.034 |
## Chat Template
Built-in chat template (baked into GGUF):
```
<|start_header_id|>system<|end_header_id|>
{system_message}</s><|start_header_id|>user<|end_header_id|>
{user_message}</s><|start_header_id|>assistant<|end_header_id|>
```
## Usage
```bash
# llama.cpp server
llama-server -m Vikra-MixP_4.9b_S.gguf -ngl 25 -c 4096
# llama.cpp CLI
llama-cli -m Vikra-MixP_4.9b_S.gguf -ngl 25 -c 4096 -cnv
```