---
language:
- en
license: apache-2.0
tags:
- merge
- mergekit
- slerp
- qwen2.5
- deepseek-r1
- reasoning
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
---

# Qwen2.5-1.5B-R1-SLERP

A SLERP merge (t=0.5) of:
- [`Qwen/Qwen2.5-1.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) — strong general instruction following
- [`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) — RL-distilled chain-of-thought reasoning

Part of a systematic merge study on the Qwen2.5-1.5B family. See also:
- [`Mohaaxa/Qwen2.5-1.5B-R1-SLERP-AWQ`](https://huggingface.co/Mohaaxa/Qwen2.5-1.5B-R1-SLERP-AWQ) — AWQ 4-bit quantized version

## Benchmarks

Evaluated against both parent models on PPL (Wikitext-2) and GSM8K (100 samples):

| Model | PPL | GSM8K |
|-------|-----|-------|
| Qwen2.5-1.5B-Instruct (parent) | 16.141 | 38.0% |
| DeepSeek-R1-Distill-Qwen-1.5B (parent) | 107.467 | 3.0% |
| **Qwen2.5-1.5B-R1-SLERP (this model)** | 1205.427 | 2.0% |

PPL delta vs Instruct parent: +1189.286
GSM8K delta vs Instruct parent: -36.0%

## Merge Config

```yaml
merge_method: slerp
base_model:
  model: Qwen/Qwen2.5-1.5B-Instruct
slices:
  - sources:
      - model: Qwen/Qwen2.5-1.5B-Instruct
        layer_range: [0, 28]
      - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
        layer_range: [0, 28]
    parameters:
      t: 0.5
```

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Mohaaxa/Qwen2.5-1.5B-R1-SLERP",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Mohaaxa/Qwen2.5-1.5B-R1-SLERP")
```

## Notes

- t=0.5 gives equal weight to both parents
- SLERP preserves weight magnitude better than linear interpolation
- Both parents share identical Qwen2.5 architecture (28 layers, hidden_dim=1536)
- For a quantized version with ~67% VRAM reduction, use the AWQ variant