jpacifico commited on
Commit
f192fd7
·
verified ·
1 Parent(s): 01cf917

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -29,11 +29,14 @@ Optimized variants (MLX, GGUF) are also available, making the model particularly
29
  ## Model Overview
30
 
31
  - **Base model:** Qwen/Qwen3-4B-Instruct-2507 (non-thinking)
32
- - **Architecture:** Decoder-only transformer
33
  - **Parameters:** 4.0B
34
- - **Window Context :** 256K long-context understanding
35
  - **Post training methods:** DPO + Model Merging
36
- - **Objective:** Improve French instruction-following and reasoning through post-training, while maintaining strong multilingual performance and keeping the same model size.
 
 
 
 
37
 
38
  **Model Variants**
39
 
@@ -45,7 +48,7 @@ Optimized variants (MLX, GGUF) are also available, making the model particularly
45
 
46
  Performance improves consistently across all tested FR benchmarks :
47
 
48
- | Benchmark | Qwen3-4B-Thinking-2507 | Chocolatine-2-4B-Instruct-DPO-v2.1 |
49
  |---|---:|---:|
50
  | french_bench_arc_challenge | 47.13 | **49.79** |
51
  | french_bench_grammar | 70.59 | **72.27** |
@@ -63,9 +66,9 @@ Chocolatine-2-4B-Instruct-DPO-v2.1 is derived from [Qwen/Qwen3-4B-Instruct-2507]
63
 
64
  **Stage 1 – DPO (Compar:IA adaptation)**
65
 
66
- Direct Preference Optimization (DPO) on a DPO-adapted version of **[Compar:IA](https://comparia.beta.gouv.fr/datasets)** data, derived from the preference dataset [comparia-votes](https://huggingface.co/datasets/ministere-culture/comparia-votes) of the French government initiative.
67
- The dataset was restructured into preference pairs and curated to build an original dataset specifically designed for DPO fine-tuning.
68
- Two dataset variants were constructed ([6k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-6k) and [13k](jpacifico/comparia-dpo-pairs-bt-13k) preference pairs).
69
  The **6k variant** was used for the DPO training reported in this release.
70
 
71
  **Stage 2 – DPO (French-ORCA pairs)**
 
29
  ## Model Overview
30
 
31
  - **Base model:** Qwen/Qwen3-4B-Instruct-2507 (non-thinking)
 
32
  - **Parameters:** 4.0B
33
+ - **Context Length:** 262,144 natively
34
  - **Post training methods:** DPO + Model Merging
35
+
36
+ Note: This model supports only non-thinking mode and does not generate `<think></think>` blocks in its outputs.
37
+ The choice of Qwen3 over Qwen3.5 is intentional: Qwen3 provides a more direct text-oriented generation style, better aligned with the objectives of this post-training. Empirically, my post-training on Qwen3 yielded *more consistent improvements across evaluated benchmarks* compared to Qwen3.5.
38
+ For use cases requiring explicit reasoning traces or structured thinking outputs, Qwen3.5 (thinking mode) is recommended.
39
+ Chocolatine-2-4B-Instruct-DPO-v2.1 is therefore optimized for direct instruction-following and practical deployment, rather than explicit CoT generation.
40
 
41
  **Model Variants**
42
 
 
48
 
49
  Performance improves consistently across all tested FR benchmarks :
50
 
51
+ | Benchmark | Qwen3-4B-Thinking-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
52
  |---|---:|---:|
53
  | french_bench_arc_challenge | 47.13 | **49.79** |
54
  | french_bench_grammar | 70.59 | **72.27** |
 
66
 
67
  **Stage 1 – DPO (Compar:IA adaptation)**
68
 
69
+ Direct Preference Optimization (DPO) on a DPO-adapted version of **[Compar:IA](https://comparia.beta.gouv.fr/datasets)** data, derived from the preference dataset [comparia-votes](https://huggingface.co/datasets/ministere-culture/comparia-votes), part of a public initiative led by the Ministry of Culture (French gov). Previous iterations of the Chocolatine model series also were selected as part of this initiative.
70
+ I constructed an original DPO dataset from these votes by transforming them into preference pairs (chosen / rejected), with additional filtering and formatting steps to make them suitable for DPO fine-tuning.
71
+ Two dataset variants were created ([6k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-6k) and [13k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-13k) preference pairs).
72
  The **6k variant** was used for the DPO training reported in this release.
73
 
74
  **Stage 2 – DPO (French-ORCA pairs)**