Text Generation
Transformers
Safetensors
French
English
qwen3
dpo
post-training
french
alignment
model-merging
chocolatine
comparia
conversational
text-generation-inference
Instructions to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1") model = AutoModelForMultimodalLM.from_pretrained("jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1
- SGLang
How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with Docker Model Runner:
docker model run hf.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1
File size: 7,859 Bytes
f22e841 6c258e3 f22e841 6c258e3 f22e841 d50beba 052dd5e d50beba b693637 f22e841 b693637 40be032 f192fd7 40be032 f192fd7 052dd5e f22e841 5eecaaf 2ab7dd4 5eecaaf 4b4b720 40be032 052dd5e 40be032 b693637 40be032 4f00f56 40be032 4f00f56 40be032 4f00f56 56d294c 4f00f56 b693637 f22e841 01cf917 f22e841 9f8aa8d f22e841 f192fd7 40be032 f22e841 9f8aa8d f22e841 9f8aa8d f22e841 9f8aa8d f22e841 9f8aa8d f22e841 9f8aa8d f22e841 40be032 da0c424 40be032 6c258e3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | ---
language:
- fr
- en
library_name: transformers
tags:
- dpo
- post-training
- french
- alignment
- model-merging
- qwen3
- chocolatine
- comparia
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- jpacifico/comparia-dpo-pairs-bt-6k
- jpacifico/french-orca-dpo-pairs-revised
---
# Chocolatine-2-4B-Instruct-DPO-v2.1
**Chocolatine-2-4B-Instruct-DPO-v2.1** is a post-trained version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), designed to improve instruction-following, reasoning, and overall performance in French, while preserving strong multilingual capabilities.
In my evaluation setup, it delivers consistent gains across the tested French benchmarks, pointing to a broad improvement in French capabilities.
Although the post-training pipeline focuses on French preference data, no degradation is observed on English tasks, and slight improvements are sometimes seen, suggesting positive cross-lingual transfer.
Optimized variants (MLX, GGUF) are also available, making the model particularly suitable for local inference.
## Model Overview
- **Base model:** Qwen/Qwen3-4B-Instruct-2507
- **Parameters:** 4.0B
- **Context Length:** 262,144 natively
- **Post training methods:** DPO + Model Merging
Note: This model supports only non-thinking mode and does not generate `<think></think>` blocks in its outputs.
This design is consistent with the goals of the post-training setup, which favors a compact dense instruct model focused on direct generation efficiency and practical downstream use.
For use cases requiring explicit reasoning traces or structured thinking outputs, Qwen/Qwen3.5-4B (thinking mode) is recommended.
**Model Variants**
- Chocolatine-2-4B-Instruct-DPO-v2.1 (this repo): Contains the retrainable weights in BF16 format
- Quantized GGUF versions : [Q4_K_M](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q4_K_M-GGUF) / [Q8_0](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q8_0-GGUF) and more from mradermacher [here](https://huggingface.co/mradermacher/Chocolatine-2-4B-Instruct-DPO-v2.1-GGUF)
- MLX (optimized for Apple silicon): [4Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-4Bit) / [6Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-8Bit)
**Ollama** : In addition to the Hugging Face release, quantized 4-bit and 8-bit variants are also available [here](https://ollama.com/jpacifico/chocolatine-2.1) on Ollama for convenient local inference.
## Benchmarks
The results indicate a consistent improvement across the tested French benchmarks, covering several capability types. This suggests a broad gain in French performance, while English results remain overall stable.
| Benchmark fr | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|---|---:|---:|
| gpqa-fr:diamond | 28.93 | **32.49** |
| french_bench_arc_challenge | 47.13 | **49.79** |
| french_bench_grammar | 70.59 | **72.27** |
| french_bench_boolqa | 88.76 | **89.89** |
| french_bench_hellaswag | 56.99 | **58.03** |
| global_mmlu_fr | 63.75 | **64.75** |
| xwinograd_fr | 66.27 | **67.47** |
| fr_mt_bench | 6.22 | **6.44** |
*FR-MT-Bench* evaluation is performed on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), using [multilingual-mt-bench](https://github.com/jpacifico/multilingual_mt_bench) with OpenAI/GPT-5 as the LLM judge.
*global_mmlu_fr*, *xwinograd_fr* and *french_bench* results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.
*gpqa-fr:diamond* using LightEval/vLLM via [kurakurai/Luth](https://github.com/kurakurai/Luth.git) process eval.
| Benchmark eng | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|---|---:|---:|
| arc_challenge | **58.79** | 58.45 |
| hellaswag | 69.08 | **70.16** |
| boolq | 84.80 | **85.32** |
| gpqa_diamond_zeroshot | **38.89** | 38.38 |
English benchmark results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.
## Training & Alignment Pipeline
Chocolatine-2-4B-Instruct-DPO-v2.1 is derived from [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) using a multi-step post-training pipeline:
**Stage 1 – DPO (Compar:IA adaptation)**
Direct Preference Optimization (DPO) on a DPO-adapted version of **[Compar:IA](https://comparia.beta.gouv.fr/datasets)** data, derived from the preference dataset [comparia-votes](https://huggingface.co/datasets/ministere-culture/comparia-votes), part of a public initiative led by the Ministry of Culture (French gov). Previous iterations of the Chocolatine model series also were selected as part of this initiative.
I constructed an original DPO dataset from these votes by transforming them into preference pairs (chosen / rejected), with additional filtering and formatting steps to make them suitable for DPO fine-tuning.
Two dataset variants were created ([6k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-6k) and [13k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-13k) preference pairs).
The **6k variant** was used for the DPO training reported in this release.
**Stage 2 – DPO (French-ORCA pairs)**
A second DPO stage using a french-version of ORCA preference pairs, based on the dataset **[jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised)**, commonly used in the Chocolatine training pipeline.
This stage further improves : general instruction alignment, robustness across tasks, cross-lingual capabilities.
**Stage 3 – Model Merging (MergeKit + TIES)**
The resulting checkpoints were merged using **MergeKit** with the TIES method.
TIES merging: selects task-relevant parameter updates, reduces destructive interference between models and preserves base model stability.
MergeKit configuration:
```yaml
# ties2 recipe
models:
- model: jpacifico/Qwen3-4B-Instruct-DPO-test2
parameters:
density: 0.5
weight: 0.5
- model: jpacifico/Qwen3-4B-Instruct-DPO-test-b3
parameters:
density: 0.5
weight: 0.5
merge_method: ties
base_model: Qwen/Qwen3-4B-Instruct-2507
parameters:
normalize: false
int8_mask: true
dtype: bfloat16
```
## Usage
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
```
## Limitations
The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanism.
Developed by: Jonathan Pacifico, 2026
Model type: LLM
Language(s) (NLP): French, English
License: Apache-2.0
Made with ❤️ in France |