srs6901 commited on
Commit
9f618d1
·
verified ·
1 Parent(s): 94e7031

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -1,3 +1,71 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - quantized
5
+ - hybrid
6
+ language:
7
+ - ru
8
+ - en
9
+ ---
10
+ # Vikra MixedPrc
11
+
12
+ 12.25B parameter Mistral-based language model with mixed-precision hybrid quantization.
13
+
14
+ ## Model Details
15
+
16
+ | Property | Value |
17
+ |---|---|
18
+ | **Architecture** | Mistral (12.25B params, 40 layers) |
19
+ | **Hidden size** | 5120 |
20
+ | **Attention heads** | 32 (8 KV heads, GQA) |
21
+ | **Intermediate size** | 14336 |
22
+ | **Context length** | 1,024,000 tokens |
23
+ | **Vocabulary** | 131,072 tokens (Tekken BPE) |
24
+ | **RoPE theta** | 1,000,000.0 |
25
+
26
+ ## Quantization: MixP_4.9b_S
27
+
28
+ Custom mixed-precision quantization scheme with per-tensor type assignment.
29
+
30
+ | Tensor group | Quant type | BPW |
31
+ |---|---|---|
32
+ | `token_embd`, `output` | BF16 | 16.00 |
33
+ | `attn_norm`, `ffn_norm`, `output_norm` | F32 | 32.00 |
34
+ | `attn_q` | Q4_K | 4.50 |
35
+ | `attn_k` | Q5_K | 5.50 |
36
+ | `attn_v` | Q3_K | 3.44 |
37
+ | `attn_output` | Q4_K | 4.50 |
38
+ | `ffn_gate` | Q3_K | 3.44 |
39
+ | `ffn_up` | Q5_K | 5.50 |
40
+ | `ffn_down` | Q5_K / Q6_K (last layers) | 5.50–6.56 |
41
+
42
+ **Overall: 6.11 BPW | Quantized layers only: 4.89 BPW | File size: 8.71 GB**
43
+
44
+ ## Perplexity
45
+
46
+ Measured on **wikitext-2-raw-test** (full dataset, 73 chunks, context 4096 tokens, 299,008 tokens evaluated):
47
+
48
+ | Model | Precision | Size | PPL |
49
+ |---|---|---|---|
50
+ | **Vikra MixP_4.9b_S** | **6.11 BPW** | **8.71 GB** | **5.5000 ± 0.032** |
51
+ | Vikhr-Nemo-12B-Instruct (baseline) | BF16 | 22.81 GB | 6.0212 ± 0.034 |
52
+
53
+ ## Chat Template
54
+
55
+ Built-in chat template (baked into GGUF):
56
+
57
+ ```
58
+ <|start_header_id|>system<|end_header_id|>
59
+ {system_message}</s><|start_header_id|>user<|end_header_id|>
60
+ {user_message}</s><|start_header_id|>assistant<|end_header_id|>
61
+ ```
62
+
63
+ ## Usage
64
+
65
+ ```bash
66
+ # llama.cpp server
67
+ llama-server -m Vikra-MixP_4.9b_S.gguf -ngl 25 -c 4096
68
+
69
+ # llama.cpp CLI
70
+ llama-cli -m Vikra-MixP_4.9b_S.gguf -ngl 25 -c 4096 -cnv
71
+ ```