Instructions to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1")
model = AutoModelForMultimodalLM.from_pretrained("jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1

SGLang

How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1 with Docker Model Runner:
```
docker model run hf.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1
```

Chocolatine-2-4B-Instruct-DPO-v2.1

File size: 7,859 Bytes

f22e841
 
 
 
 
 
 
 
 
 
 
 
 
6c258e3
f22e841
 
 
 
 
 
 
6c258e3
f22e841
d50beba
052dd5e
d50beba
b693637
f22e841
 
 
 
b693637
40be032
f192fd7
40be032
f192fd7
 
052dd5e
 
f22e841
5eecaaf
 
 
2ab7dd4
5eecaaf
 
4b4b720
 
40be032
 
052dd5e
40be032
b693637
40be032
4f00f56
40be032
 
 
 
 
4f00f56
40be032
 
 
4f00f56
56d294c
4f00f56
b693637
 
 
 
 
 
 
 
f22e841
 
 
01cf917
f22e841
9f8aa8d
f22e841
f192fd7
 
 
40be032
f22e841
9f8aa8d
f22e841
9f8aa8d
 
f22e841
9f8aa8d
f22e841
 
 
9f8aa8d
f22e841
9f8aa8d
f22e841
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40be032
 
 
da0c424
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40be032
 
 
 
 
 
 
 
 
 
6c258e3

---
language:
- fr
- en
library_name: transformers
tags:
- dpo
- post-training
- french
- alignment
- model-merging
- qwen3
- chocolatine
- comparia
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- jpacifico/comparia-dpo-pairs-bt-6k
- jpacifico/french-orca-dpo-pairs-revised
---

# Chocolatine-2-4B-Instruct-DPO-v2.1

**Chocolatine-2-4B-Instruct-DPO-v2.1** is a post-trained version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), designed to improve instruction-following, reasoning, and overall performance in French, while preserving strong multilingual capabilities.  
In my evaluation setup, it delivers consistent gains across the tested French benchmarks, pointing to a broad improvement in French capabilities.  
Although the post-training pipeline focuses on French preference data, no degradation is observed on English tasks, and slight improvements are sometimes seen, suggesting positive cross-lingual transfer.  
Optimized variants (MLX, GGUF) are also available, making the model particularly suitable for local inference.  


## Model Overview

- **Base model:** Qwen/Qwen3-4B-Instruct-2507
- **Parameters:** 4.0B
- **Context Length:** 262,144 natively
- **Post training methods:** DPO + Model Merging

Note: This model supports only non-thinking mode and does not generate `<think></think>` blocks in its outputs.  
This design is consistent with the goals of the post-training setup, which favors a compact dense instruct model focused on direct generation efficiency and practical downstream use.    
For use cases requiring explicit reasoning traces or structured thinking outputs, Qwen/Qwen3.5-4B (thinking mode) is recommended.

**Model Variants**

- Chocolatine-2-4B-Instruct-DPO-v2.1 (this repo): Contains the retrainable weights in BF16 format
- Quantized GGUF versions : [Q4_K_M](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q4_K_M-GGUF) / [Q8_0](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q8_0-GGUF) and more from mradermacher [here](https://huggingface.co/mradermacher/Chocolatine-2-4B-Instruct-DPO-v2.1-GGUF)
- MLX (optimized for Apple silicon): [4Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-4Bit) / [6Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-8Bit)

**Ollama** : In addition to the Hugging Face release, quantized 4-bit and 8-bit variants are also available [here](https://ollama.com/jpacifico/chocolatine-2.1) on Ollama for convenient local inference.

## Benchmarks

The results indicate a consistent improvement across the tested French benchmarks, covering several capability types. This suggests a broad gain in French performance, while English results remain overall stable.

| Benchmark fr | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|---|---:|---:|
| gpqa-fr:diamond | 28.93 | **32.49** |
| french_bench_arc_challenge | 47.13 | **49.79** |
| french_bench_grammar | 70.59 | **72.27** |
| french_bench_boolqa | 88.76 | **89.89** |
| french_bench_hellaswag | 56.99 | **58.03** |
| global_mmlu_fr | 63.75 | **64.75** |
| xwinograd_fr | 66.27 | **67.47** |
| fr_mt_bench | 6.22 | **6.44** |

*FR-MT-Bench* evaluation is performed on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), using [multilingual-mt-bench](https://github.com/jpacifico/multilingual_mt_bench) with OpenAI/GPT-5 as the LLM judge.  
*global_mmlu_fr*, *xwinograd_fr* and *french_bench* results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.  
*gpqa-fr:diamond* using LightEval/vLLM via [kurakurai/Luth](https://github.com/kurakurai/Luth.git) process eval.

| Benchmark eng | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|---|---:|---:|
| arc_challenge | **58.79** | 58.45 |
| hellaswag | 69.08 | **70.16** |
| boolq | 84.80 | **85.32** |
| gpqa_diamond_zeroshot | **38.89** | 38.38 |

English benchmark results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.

## Training & Alignment Pipeline  

Chocolatine-2-4B-Instruct-DPO-v2.1 is derived from [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) using a multi-step post-training pipeline:

**Stage 1 – DPO (Compar:IA adaptation)**

Direct Preference Optimization (DPO) on a DPO-adapted version of **[Compar:IA](https://comparia.beta.gouv.fr/datasets)** data, derived from the preference dataset [comparia-votes](https://huggingface.co/datasets/ministere-culture/comparia-votes), part of a public initiative led by the Ministry of Culture (French gov). Previous iterations of the Chocolatine model series also were selected as part of this initiative.  
I constructed an original DPO dataset from these votes by transforming them into preference pairs (chosen / rejected), with additional filtering and formatting steps to make them suitable for DPO fine-tuning.  
Two dataset variants were created ([6k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-6k) and [13k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-13k) preference pairs).  
The **6k variant** was used for the DPO training reported in this release.

**Stage 2 – DPO (French-ORCA pairs)**

A second DPO stage using a french-version of ORCA preference pairs, based on the dataset **[jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised)**, commonly used in the Chocolatine training pipeline.  
This stage further improves : general instruction alignment, robustness across tasks, cross-lingual capabilities. 

**Stage 3 – Model Merging (MergeKit + TIES)**

The resulting checkpoints were merged using **MergeKit** with the TIES method.

TIES merging: selects task-relevant parameter updates, reduces destructive interference between models and preserves base model stability.

MergeKit configuration:

```yaml
# ties2 recipe
models:
  - model: jpacifico/Qwen3-4B-Instruct-DPO-test2
    parameters:
      density: 0.5
      weight: 0.5
  - model: jpacifico/Qwen3-4B-Instruct-DPO-test-b3
    parameters:
      density: 0.5
      weight: 0.5

merge_method: ties
base_model: Qwen/Qwen3-4B-Instruct-2507

parameters:
  normalize: false
  int8_mask: true

dtype: bfloat16
```

## Usage

The following contains a code snippet illustrating how to use the model generate content based on given inputs.  
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)
```

## Limitations  

The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanism.  

Developed by: Jonathan Pacifico, 2026   
Model type: LLM  
Language(s) (NLP): French, English  
License: Apache-2.0  

Made with ❤️ in France