machiabeli commited on
Commit
3dbfd6f
·
verified ·
1 Parent(s): ff9ffe9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mlx
3
+ tags:
4
+ - mlx
5
+ - safetensors
6
+ - kimi_k25
7
+ - quantized
8
+ - moe-aware-quant
9
+ - image-text-to-text
10
+ - conversational
11
+ - custom_code
12
+ base_model: moonshotai/Kimi-K2.6
13
+ base_model_relation: quantized
14
+ language:
15
+ - en
16
+ pipeline_tag: image-text-to-text
17
+ ---
18
+
19
+ # Kimi-K2.6-MoE-Smart-Quant (MLX)
20
+
21
+ MoE-aware mixed-precision quantization of [moonshotai/Kimi-K2.6](https://huggingface.co/moonshotai/Kimi-K2.6) for Apple Silicon.
22
+
23
+ ## Quantization Strategy
24
+
25
+ Unlike uniform quantization, this applies **per-component bit allocation** optimized for MoE + MLA architecture:
26
+
27
+ | Component | Bits | Rationale |
28
+ |-----------|------|-----------|
29
+ | Routed experts (384 SwitchLinear) | 4-bit | Only 8/384 fire per token — very tolerant of low-bit |
30
+ | Shared expert (always active) | 6-bit | Every-token path, needs precision |
31
+ | MLA value projections (v_a/v_b) | 8-bit | Most sensitive attention weights |
32
+ | MLA other projections (q_a/q_b/kv_a/kv_b/o) | 6-bit | Latent compression layer |
33
+ | lm_head + embed_tokens | 8-bit | Output quality |
34
+ | First/last 3 decoder layers | 6-bit | Boundary layer sensitivity |
35
+ | Gate/router | unquantized | Tiny params, routing-critical |
36
+ | Vision encoder | unquantized | Preserved via mlx-vlm |
37
+
38
+ **Effective average: ~4.5 bpw** — near-6-bit quality at near-4-bit size.
39
+
40
+ ## Model Details
41
+
42
+ - **Base model**: Kimi-K2.6 (1T params, 32B active, 384 experts)
43
+ - **Architecture**: MoE + MLA (kimi_k25)
44
+ - **Context**: 256K tokens
45
+ - **Modality**: Vision + Language (VLM)
46
+ - **Converted with**: mlx-vlm 0.4.2
47
+
48
+ ## Usage
49
+
50
+
51
+
52
+ ## Hardware Requirements
53
+
54
+ - **Single node**: M3/M4 Ultra 192GB+ (fits in ~150GB)
55
+ - **Distributed**: 2x M3 Ultra via JACCL/RDMA for headroom
56
+
57
+ ---
58
+
59
+ *Weights uploading — conversion in progress.*