---
license: cc-by-4.0
datasets:
- allenai/c4
language:
- en
metrics:
- accuracy
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
pipeline_tag: text-generation
tags:
- gptq
- int4
- quantized
- qlora
- medical
- medqa
- biology
- chemistry
- finance
- legal
- climate
- reasoning
- 4-bit
model-index:
- name: Chaperone-Thinking-LQ-1.0
  results:
  - task:
      type: text-generation
      name: Medical QA
    dataset:
      name: MedQA
      type: medqa
    metrics:
    - type: accuracy
      value: 84.0
  - task:
      type: text-generation
      name: Math Reasoning
    dataset:
      name: MATH-500
      type: math-500
    metrics:
    - type: accuracy
      value: 91.9
  - task:
      type: text-generation
      name: Math Competition
    dataset:
      name: AIME 2024
      type: aime-2024
    metrics:
    - type: accuracy
      value: 66.7
  - task:
      type: text-generation
      name: Graduate-Level QA
    dataset:
      name: GPQA Diamond
      type: gpqa-diamond
    metrics:
    - type: accuracy
      value: 56.7
  - task:
      type: text-generation
      name: Knowledge Understanding
    dataset:
      name: MMLU
      type: mmlu
    metrics:
    - type: accuracy
      value: 85.9
  - task:
      type: text-generation
      name: Math Reasoning
    dataset:
      name: GSM8K-Platinum
      type: gsm8k
    metrics:
    - type: accuracy
      value: 84.04
  - task:
      type: text-generation
      name: Instruction Following
    dataset:
      name: IFEval
      type: ifeval
    metrics:
    - type: accuracy
      value: 83.34
  - task:
      type: text-generation
      name: Knowledge Understanding
    dataset:
      name: MMLU-PRO
      type: mmlu-pro
    metrics:
    - type: accuracy
      value: 65.76
---

# Chaperone-Thinking-LQ-1.0

A domain-optimized reasoning model built on **DeepSeek-R1-Distill-Qwen-32B**, refined through a multi-stage pipeline of GPTQ quantization-aware training and QLoRA fine-tuning. Achieves **84% on MedQA** — within 4 points of GPT-4o — in a ~20GB package that fits on a single L40/L40s GPU.

**Fully open-source under CC-BY-4.0.**

---

## Highlights

- **Base model:** DeepSeek-R1-Distill-Qwen-32B (32B parameters)
- **Size reduction:** ~60GB → ~20GB (4-bit GPTQ)
- **MedQA accuracy:** 84% (GPT-4o: ~88%)
- **Hardware target:** Runs on a single NVIDIA L40, L40s, or A100 GPU
- **License:** CC-BY-4.0

---

## How We Built It

This model is **not** a simple quantization. It was produced through a four-stage pipeline:

| Stage | Method | What it does |
|-------|--------|-------------|
| **1. Quantization** | 4-bit GPTQ | Compresses weights from ~60GB to ~20GB for efficient inference |
| **2. Quantization-Aware Training** | GPTQ-based QAT with calibration | Minimizes accuracy loss during quantization by optimizing scale/zero-point parameters against a calibration dataset |
| **3. Domain Fine-Tuning** | QLoRA | Adapts the quantized model on medical and scientific corpora, recovering and improving accuracy for domain-specific reasoning |
| **4. Transparency** | Adaptive layer removal | Removes the identity adaptive layer so the model correctly attributes its foundational architecture to its original creators |

---

## Benchmark Results

### MedQA

| Model | Accuracy |
|-------|----------|
| **Chaperone-Thinking-LQ-1.0** | **84%** |
| GPT-4o | 88% |

### Multi-Model Comparison

| Benchmark | DeepSeek-R1 | OpenAI-o1-1217 | DeepSeek-R1-32B | OpenAI-o1-mini | **Chaperone-Thinking-LQ-1.0** |
|-----------|:-----------:|:--------------:|:---------------:|:--------------:|:----------------------------:|
| **AIME 2024** | 79.8 | 79.2 | 72.6 | 63.6 | **66.7** |
| **GPQA Diamond** | 71.5 | 75.7 | 62.1 | 60.0 | **56.7** |
| **MATH-500** | 97.3 | 96.4 | 94.3 | 90.0 | **91.9** |
| **MMLU** | 90.8 | 91.8 | 87.4 | 85.2 | **85.9** |

> Chaperone-Thinking-LQ-1.0 delivers competitive performance against full-precision frontier models at ~3x smaller model size.


### Speed & Latency

| Metric | Chaperone-Thinking-LQ-1.0 | DeepSeek-R1-Distill-Qwen-32B |
|--------|--------------------------|------------------------------|
| Throughput | **36.86 tok/s** | 22.84 tok/s |
| Latency p50 | **11.49s** | 20.10s |
| Latency p95 | **13.06s** | 20.11s |

> 1.6x higher throughput with ~43% lower median latency.
> Averages over 10 trials, concurrency=1, max_tokens=512, temperature=0.

---

## Model Details

| | |
|---|---|
| **Base model** | DeepSeek-R1-Distill-Qwen-32B |
| **Parameters** | 32 billion |
| **Quantization** | 4-bit GPTQ |
| **Fine-tuning** | QLoRA on medical/scientific corpora |
| **Model size** | ~20GB |
| **Precision** | torch.float16 |
| **Evaluation hardware** | NVIDIA A100 80GB PCIe |
| **CUDA** | 12.4 |
| **PyTorch** | 2.6.0+cu124 |

---

## Intended Use

- Medical and clinical reasoning tasks
- Scientific Q&A and research workflows
- Enterprise deployments requiring data sovereignty (on-premises, private cloud)
- Domain-specific text analysis and insight extraction

---

## Limitations

- 4-bit quantization introduces some accuracy trade-off on general benchmarks vs. the full-precision base model
- Domain fine-tuning is optimized for medical/scientific reasoning; general-purpose performance may differ
- Not intended as a replacement for professional medical judgment

---

## Citation

If you use this model, please cite:

```bibtex
@misc{chaperone-thinking-lq,
  title={Chaperone-Thinking-LQ-1.0: Domain-Optimized Reasoning via GPTQ-QAT and QLoRA},
  author={Empirisch Technologies},
  year={2025},
  url={https://huggingface.co/empirischtech}
}
```

---

## Links

- **Website:** [chaperoneai.net](https://chaperoneai.net/benchmark)
- **Hugging Face:** [[empirischtech](https://huggingface.co/empirischtech)](https://empirischtech.at/)