jorge-erdb commited on
Commit
7dc059b
Β·
verified Β·
1 Parent(s): 947f70e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +231 -3
README.md CHANGED
@@ -1,3 +1,231 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash
4
+ tags:
5
+ - text-generation-inference
6
+ - transformers
7
+ - unsloth
8
+ - qwen3_5
9
+ - reasoning
10
+ - distillation
11
+ - deepseek
12
+ - deepseek-v4
13
+ - sft
14
+ - long-cot
15
+ - chain-of-thought
16
+ - efficient-inference
17
+ - agent
18
+ - multilingual
19
+ - 4bit
20
+ license: apache-2.0
21
+ language:
22
+ - en
23
+ - zh
24
+ - ko
25
+ - ja
26
+ - es
27
+ - ru
28
+ pipeline_tag: image-text-to-text
29
+ datasets:
30
+ - Jackrong/DeepSeek-V4-Distill-8000x
31
+ ---
32
+ # Qwen3.5-9B-DeepSeek-V4-Flash-GGUF
33
+
34
+ GGUF quantizations of [Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash](https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash), Jackrong's distillation of DeepSeek-V4's reasoning into the Qwen3.5-9B architecture using the [DeepSeek-V4-Distill-8000x](https://huggingface.co/datasets/Jackrong/DeepSeek-V4-Distill-8000x) dataset.
35
+
36
+ ---
37
+
38
+ ## Quantization Details
39
+
40
+ | Detail | Value |
41
+ |---|---|
42
+ | Quants | Q4_K_M, **D-IQ4_NL** (dynamic) |
43
+ | Quantized by | [jorge-erdb](https://huggingface.co/jorge-erdb) |
44
+ | Method | [llama.cpp](https://github.com/ggml-org/llama.cpp) |
45
+ | Source model | [Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash](https://huggingface.co/Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash) (BF16) |
46
+
47
+ ### Dynamic IQ4_NL Recipe
48
+
49
+ The **D-IQ4_NL** variant uses a custom tensor-level precision recipe rather than a uniform IQ4_NL quantization.
50
+
51
+ ---
52
+
53
+ ## Download
54
+
55
+ ```bash
56
+ pip install -U "huggingface_hub[cli]"
57
+
58
+ # Q4_K_M (smallest, broadest backend support)
59
+ huggingface-cli download jorge-erdb/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF --include "*Q4_K_M.gguf" --local-dir ./
60
+
61
+ # D-IQ4_NL (dynamic, Q6_K-protected attention/SSM)
62
+ huggingface-cli download jorge-erdb/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF --include "*D-IQ4_NL.gguf" --local-dir ./
63
+ ```
64
+
65
+ ---
66
+
67
+ > [!IMPORTANT]
68
+ > ## Apple Metal Backend Warning
69
+ >
70
+ > **D-IQ4_NL uses non-linear quantization on FFN tensors.** It performs sub-optimally on Apple's Metal backend due to the lack of native support for non-linear dequantization kernels. If you are running on an Apple Silicon Mac with GPU offloading via Metal, you will likely experience:
71
+ >
72
+ > - Slower inference compared to linear quants of similar size (e.g., Q4_K_M)
73
+ > - No speed benefit from the ARM weight repacking that IQ4_NL supports on CPU
74
+ >
75
+ > **If you're on Apple Metal, use the Q4_K_M variant instead.** D-IQ4_NL is best suited for CUDA (NVIDIA GPU) or CPU-only inference.
76
+
77
+ ---
78
+
79
+ ## Credits
80
+
81
+ - **Quantization**: [jorge-erdb](https://huggingface.co/jorge-erdb)
82
+ - **Distillation & training**: [Jackrong](https://huggingface.co/Jackrong) β€” Qwen3.5-9B-DeepSeek-V4-Flash
83
+ -
84
+ ---
85
+ # 🌟 Qwen3.5-9B-DeepSeek-V4-Flash
86
+
87
+ ## πŸ’‘ Model Overview & Design
88
+
89
+ ![ChatGPT Image Apr 24, 2026 at 04_32_09 PM](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/J3m3YKzmCmDtbKOZNPCW-.png)
90
+
91
+ > [!NOTE]
92
+ > **Qwen3.5-9B-DeepSeek-V4-Flash** is an efficient reasoning model distilled using high-quality data from **DeepSeek-V4**.
93
+
94
+
95
+ - By leveraging the dataset **Jackrong/DeepSeek-V4-Distill-8000x**, this model successfully transfers the advanced structured reasoning and multi-step problem-solving capabilities of the DeepSeek-V4 architecture into the highly efficient **Qwen3.5-9B** parameter space.
96
+
97
+ - This model was trained in an **Unsloth** environment, prioritizing stable gradient propagation and rigorous data curation to ensure the distillation process avoids merely learning "hollow chain-of-thought" and instead captures genuine logical generalization.
98
+
99
+ Designed for:
100
+ - 🧩 **Structured Reasoning**: Inheriting DeepSeek-V4's deep logic capabilities.
101
+ - ⚑ **Flash Inference**: Maintaining the token-efficiency and speed of the 9B parameter size.
102
+ - πŸ”§ **Tool-augmented Workflows**: Reliable agentic action generation.
103
+
104
+ ---
105
+
106
+ ### 🍎 About the Teacher Model: DeepSeek-V4
107
+
108
+ ![dsv4_performance](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/iBQ7B-z3bpdmsJkdmEPGC.png)
109
+
110
+
111
+ **[DeepSeek-V4](https://huggingface.co/collections/deepseek-ai/deepseek-v4)** is the latest flagship open-source model series from DeepSeek, engineered for extreme efficiency, million-token long context (1M), and advanced Agentic workflows. As the source for this distillation, DeepSeek-V4 provides the high-fidelity reasoning signals necessary to push a 9B model beyond its architectural limits.
112
+
113
+ **Key Technical Strengths of the Teacher Model:**
114
+
115
+ * **πŸ† World-Class Reasoning & Coding:** DeepSeek-V4 demonstrates elite performance in mathematics (MATH-500), STEM subjects, and real-world software engineering (SWE-bench). Its "Think" modes provide the sophisticated Long-CoT (Chain-of-Thought) traces that define this model's logic.
116
+ * **🧠 Architectural Innovation:** * **Hybrid Attention & DSA:** Features Token-level compression and DeepSeek Sparse Attention, which reduces KV Cache memory overhead by up to 90%, allowing for highly efficient long-context processing.
117
+ * **Engram Memory & mHC:** Utilizes Manifold-constrained Hyper-connections to decouple factual knowledge retrieval from dynamic logical reasoning, ensuring exceptional stability and generalization.
118
+ * **πŸ€– Agent-Centric Design:** Specifically optimized for multi-step tool calling and complex environment interaction, ensuring that the distilled knowledge includes reliable "how-to-act" procedures, not just "how-to-talk."
119
+
120
+ By distilling from **DeepSeek-V4-Flash**, we have successfully mapped the high-density logic of a trillion-parameter class model onto the agile and high-speed **Qwen3.5-9B** framework.
121
+
122
+ ---
123
+
124
+ ## 🀝 Collaboration & Training Details
125
+ This model is the result of a close collaboration with hardware engineer **Kyle Hessling**. He generously provided the crucial compute equipment and managed both the rigorous post-training testing and continuous server maintenance.
126
+ I want to express my gratitude to Kyle for his invaluable support!
127
+ You can find him on X/Twitter here: [@KyleHessling1](https://x.com/KyleHessling1)
128
+
129
+ **Training Infrastructure & Configuration:**
130
+ - πŸ–₯️ **Hardware:** NVIDIA DGX
131
+ - πŸ’Ύ **Training Data:** DeepSeek-V4-Distill-8000x
132
+ - πŸ§ͺ **Training Method:** Distillation
133
+
134
+ ---
135
+
136
+ ## 🎯 Motivation & Distillation Insights
137
+ - 🧠 **Latent Knowledge Activation**: DeepSeek-V4's reasoning traces help the Qwen3.5-9B model activate its existing latent knowledge more effectively.
138
+ - πŸ—οΈ **Learning Procedures**: The model learns actual problem-solving procedures, not just the output format.
139
+ - πŸš€ **Efficiency**: The 8000x dataset provides a dense signal, allowing the 9B model to converge on reasoning tasks much faster than traditional large-scale SFT.
140
+
141
+ ---
142
+
143
+ ## πŸ“Š Evaluation
144
+ > [!IMPORTANT]
145
+ > This is an early controlled **Q5_K_M** comparison between **Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash** and the official **Qwen3.5-9B** base model.
146
+ >
147
+ > This evaluation was completed by **Kyle Hessling**, who ran the same evaluation suite twice under the same local inference conditions: once on the DeepSeek-V4 distill model and once on the official Qwen3.5-9B base model.
148
+
149
+
150
+ - ❀️ Special thanks to Kyle for the careful post-training testing and detailed comparison report. You can find him on X/Twitter here: **[@KyleHessling1](https://x.com/KyleHessling1)**.
151
+ - πŸ“„ Full evaluation report: **[KyleHessling1/jackrong-deepseek-9b-eval](https://huggingface.co/spaces/KyleHessling1/jackrong-deepseek-9b-eval)**.
152
+
153
+ ![Evaluation Report](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/GtqFy-my7GXQ3xRRXTxYp.png)
154
+
155
+ ![Comparison Method](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/-w7X_kpErCPYV5QHB-jw3.png)
156
+
157
+ ![Agentic Reasoning Results](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/DFAx6miaEoXuqmSPSSJAC.png)
158
+
159
+ ![Front-end Design Results](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/W_mUxkwfRYcZOyGy4sPx2.png)
160
+
161
+ ![Tool Calling Results](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/rCJPUY0KnB8mkyI7yAI-3.png)
162
+
163
+ ![Evaluation Setup](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/6mzcBTSgLLT_kL1dHafAy.png)
164
+
165
+ ---
166
+ ## πŸ”¬ Supporting Evidence
167
+
168
+ Recent work and empirical tests support this distillation approach:
169
+
170
+ **Ren et al., 2026 β€” *Rethinking Generalization in Reasoning SFT*** ([arXiv:2604.06628](https://arxiv.org/abs/2604.06628))
171
+
172
+ The paper suggests that generalization in reasoning SFT is conditional.
173
+ Key takeaways:
174
+ - **High-quality long-CoT data** from DeepSeek-V4 enables cross-domain transfer.
175
+ - **Optimization Discipline**: Short, highly-curated distillation (8000 examples) prevents the model from overfitting to the teacher's stylistic quirks while preserving the core reasoning engine.
176
+
177
+ ---
178
+
179
+ ## πŸ› οΈ Best Practices
180
+
181
+ For optimal performance, we recommend the following generation parameters:
182
+
183
+ * `temperature=0.7` to `1.0` (Use lower temperature for strict coding tasks, higher for creative reasoning)
184
+ * `top_p=0.95`
185
+
186
+ When interacting with the model, using a structured prompt template or standard ChatML format will yield the best reasoning results.
187
+
188
+ ---
189
+
190
+ ## πŸ“š Resources & Guides
191
+
192
+ πŸ‘‰ **[GitHub Repository: Jackrong-llm-finetuning-guide](https://github.com/R6410418/Jackrong-llm-finetuning-guide.git)**
193
+ Visit the repository to dive into the codebase and reproduce the results locally or on Colab.
194
+
195
+ ### πŸ“₯ Core Technical Document
196
+ **πŸ”— [Complete Fine-Tuning Guide (PDF)](https://github.com/R6410418/Jackrong-llm-finetuning-guide/blob/main/guidePDF/Qwopus3-5-9b-Colab_complete_guide_to_llm_finetuning.pdf)**
197
+
198
+ > **A Note:**
199
+ > My goal isn't just to detail a workflow, but to demystify LLM training. Beyond the social media hype, fine-tuning isn't an unattainable ritualβ€”often, all you need is a Google account, a standard laptop, and relentless curiosity.
200
+ > All training and testing for this project were self-funded. If you find this model or guide helpful, a **Star ⭐️ on GitHub** would be the greatest encouragement. Thank you! πŸ™
201
+
202
+ ---
203
+
204
+ ## ⚠️ Limitations
205
+
206
+ - **Parameter Constraints**: While enhanced by DeepSeek-V4 distillation, the model is still bound by the 9B parameter limits and may struggle with extremely obscure knowledge.
207
+ - **Over-reasoning**: On very simple queries, the model might still attempt to produce a lengthy reasoning chain due to the SFT bias.
208
+ - **Safety Trade-offs**: Asymmetric gains mean that while reasoning improves, certain alignment-sensitive behaviors might regress.
209
+
210
+ ---
211
+
212
+ ## πŸ™ Acknowledgements
213
+
214
+ Special thanks to:
215
+ - **DeepSeek Team** for the foundational advancements in the V4 architecture.
216
+ - **Unsloth** for efficient fine-tuning frameworks.
217
+ - Open-source datasets and community contributors.
218
+ - Researchers exploring reasoning SFT and distillation.
219
+
220
+ ---
221
+
222
+ ## πŸ“– Citation
223
+
224
+ ```bibtex
225
+ @misc{jackrong_qwen35_9b_deepseek_v4_flash,
226
+ title = {Qwen3.5-9B-DeepSeek-V4-Flash},
227
+ author = {Jackrong},
228
+ year = {2026},
229
+ publisher = {Hugging Face}
230
+ }
231
+ ```