--- library_name: transformers tags: - quantized - hybrid language: - ru - en --- # Vikra MixedPrc 12.25B parameter Mistral-based language model with mixed-precision hybrid quantization. ## Model Details | Property | Value | |---|---| | **Architecture** | Mistral (12.25B params, 40 layers) | | **Hidden size** | 5120 | | **Attention heads** | 32 (8 KV heads, GQA) | | **Intermediate size** | 14336 | | **Context length** | 1,024,000 tokens | | **Vocabulary** | 131,072 tokens (Tekken BPE) | | **RoPE theta** | 1,000,000.0 | ## Quantization: MixP_4.9b_S Custom mixed-precision quantization scheme with per-tensor type assignment. | Tensor group | Quant type | BPW | |---|---|---| | `token_embd`, `output` | BF16 | 16.00 | | `attn_norm`, `ffn_norm`, `output_norm` | F32 | 32.00 | | `attn_q` | Q4_K | 4.50 | | `attn_k` | Q5_K | 5.50 | | `attn_v` | Q3_K | 3.44 | | `attn_output` | Q4_K | 4.50 | | `ffn_gate` | Q3_K | 3.44 | | `ffn_up` | Q5_K | 5.50 | | `ffn_down` | Q5_K / Q6_K (last layers) | 5.50–6.56 | **Overall: 6.11 BPW | Quantized layers only: 4.89 BPW | File size: 8.71 GB** ## Perplexity Measured on **wikitext-2-raw-test** (full dataset, 73 chunks, context 4096 tokens, 299,008 tokens evaluated): | Model | Precision | Size | PPL | |---|---|---|---| | **Vikra MixP_4.9b_S** | **6.11 BPW** | **8.71 GB** | **5.5000 ± 0.032** | | Vikhr-Nemo-12B-Instruct (baseline) | BF16 | 22.81 GB | 6.0212 ± 0.034 | ## Chat Template Built-in chat template (baked into GGUF): ``` <|start_header_id|>system<|end_header_id|> {system_message}<|start_header_id|>user<|end_header_id|> {user_message}<|start_header_id|>assistant<|end_header_id|> ``` ## Usage ```bash # llama.cpp server llama-server -m Vikra-MixP_4.9b_S.gguf -ngl 25 -c 4096 # llama.cpp CLI llama-cli -m Vikra-MixP_4.9b_S.gguf -ngl 25 -c 4096 -cnv ```