caiovicentino1 commited on
Commit
35b5989
·
verified ·
1 Parent(s): 6837d50

docs: add HLWQ rebrand notice (cite Han et al. PolarQuant prior art)

Browse files
Files changed (1) hide show
  1. README.md +17 -7
README.md CHANGED
@@ -3,17 +3,27 @@ license: other
3
  license_name: nvidia-open-model-license
4
  base_model: nvidia/Nemotron-Cascade-2-30B-A3B
5
  tags:
6
- - polarquant
7
- - moe
8
- - expert-offloading
9
- - nemotron
10
- - mamba
11
- - consumer-gpu
12
- - vllm
 
13
  library_name: transformers
14
  pipeline_tag: text-generation
15
  ---
16
 
 
 
 
 
 
 
 
 
 
17
  # Nemotron-Cascade-2-30B-A3B — Expert Offloading + PolarQuant Q5
18
 
19
  **30B MoE model at 7.6 GB VRAM, 15+ tok/s, correct output.**
 
3
  license_name: nvidia-open-model-license
4
  base_model: nvidia/Nemotron-Cascade-2-30B-A3B
5
  tags:
6
+ - hlwq
7
+ - polarquant
8
+ - moe
9
+ - expert-offloading
10
+ - nemotron
11
+ - mamba
12
+ - consumer-gpu
13
+ - vllm
14
  library_name: transformers
15
  pipeline_tag: text-generation
16
  ---
17
 
18
+ > [!IMPORTANT]
19
+ > **Naming notice (2026-04-10).** The "PolarQuant" technique used in this model is being rebranded to **HLWQ (Hadamard-Lloyd Weight Quantization)**. The change is only the name; the algorithm and the weights in this repository are unchanged.
20
+ >
21
+ > The rebrand resolves a name collision with an unrelated, earlier KV cache quantization method also named PolarQuant ([Han et al., arXiv:2502.02617, 2025](https://arxiv.org/abs/2502.02617)). HLWQ addresses **weight** quantization with a **deterministic Walsh-Hadamard rotation** and Lloyd-Max scalar codebook; Han et al.'s PolarQuant addresses **KV cache** quantization with a **random polar rotation**. The two methods are technically distinct.
22
+ >
23
+ > Existing loaders that load this repository by ID continue to work without changes. Future model uploads will use the HLWQ name.
24
+ >
25
+ > Reference paper for this technique: [arXiv:2603.29078](https://arxiv.org/abs/2603.29078) (v2 in preparation; v1 still uses the old name).
26
+
27
  # Nemotron-Cascade-2-30B-A3B — Expert Offloading + PolarQuant Q5
28
 
29
  **30B MoE model at 7.6 GB VRAM, 15+ tok/s, correct output.**