---
language: en
license: apache-2.0
tags:
  - physics
  - high-energy-physics
  - hep
  - particle-physics
  - fine-tuned
  - qwen3.5
  - amd-mi300x
  - rocm
base_model: Qwen/Qwen3.5-9B
datasets:
  - arxiv-hep
  - inspire-hep
  - cms-open-data
  - pdg-particle-data
pipeline_tag: text-generation
library_name: transformers
model_size: 9B
widget:
  - text: "What is the invariant mass of two photons with energies 62.5 GeV each, traveling back-to-back?"
    example_title: Invariant mass calculation
  - text: "Explain the CMS detector architecture and its main subsystems."
    example_title: Detector explanation
  - text: "A Z boson decays at rest into an electron-positron pair. What is the electron momentum?"
    example_title: Decay kinematics
model-index:
  - name: hep-agent-qwen-qwen3-5-9b-mi300x
    results:
      - task:
          type: text-generation
        dataset:
          name: MMLU
          type: cais/mmlu
          config: all
        metrics:
          - name: MMLU (5-shot)
            type: acc
            value: 70.6
            verified: false
      - task:
          type: text-generation
        dataset:
          name: ARC Challenge
          type: allenai/ai2_arc
          config: ARC-Challenge
        metrics:
          - name: ARC-Challenge (25-shot, norm)
            type: acc_norm
            value: 71.8
            verified: false
      - task:
          type: text-generation
        dataset:
          name: MMLU Conceptual Physics
          type: cais/mmlu
          config: conceptual_physics
        metrics:
          - name: MMLU Conceptual Physics (5-shot)
            type: acc
            value: 77.9
            verified: false
      - task:
          type: text-generation
        dataset:
          name: MMLU College Physics
          type: cais/mmlu
          config: college_physics
        metrics:
          - name: MMLU College Physics (5-shot)
            type: acc
            value: 58.8
            verified: false
      - task:
          type: text-generation
        dataset:
          name: MMLU High School Physics
          type: cais/mmlu
          config: high_school_physics
        metrics:
          - name: MMLU High School Physics (5-shot)
            type: acc
            value: 62.9
            verified: false
      - task:
          type: text-generation
        dataset:
          name: MMLU Astronomy
          type: cais/mmlu
          config: astronomy
        metrics:
          - name: MMLU Astronomy (5-shot)
            type: acc
            value: 80.9
            verified: false
---

# hep-agent-qwen-qwen3-5-9b-mi300x

> **HEP domain expert** — Fine-tuned Qwen/Qwen3.5-9B on High Energy Physics data.

This model is a full fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) on a curated
corpus of High Energy Physics literature, experimental data, and synthetic Q&A.
Trained on a single AMD MI300X (192 GB HBM3, ROCm 7.0).


## Model Overview

| Property | Value |
|----------|-------|
| Base model | `Qwen/Qwen3.5-9B` |
| Fine-tuning type | Full fine-tune (NOT LoRA) |
| Hardware | 1× AMD MI300X (192 GB HBM3, ROCm 7.0) |
| Precision | bfloat16 |
| Context length | 2048 tokens |
| Training data | ~50K–100K HEP examples |
| Optimizer | AdamW 8-bit (bitsandbytes) |

## Evaluation Results

All scores are accuracy (%) unless noted. Comparison against the unmodified `Qwen/Qwen3.5-9B` base.

### General Benchmarks

| Benchmark | Shots | Metric | Base (%) | Fine-tuned (%) | Δ |
|-----------|-------|--------|----------|----------------|---|
| MMLU Full | 5 | acc | 69.8 | **70.6** | +0.7 |
| ARC-Challenge | 25 | acc_norm | 71.1 | **71.8** | +0.7 |

No significant regressions were detected (threshold: −3 pp).

### MMLU Physics Subsets (extracted from MMLU Full run)

| Subset | Base (%) | Fine-tuned (%) | Δ |
|--------|----------|----------------|---|
| Conceptual Physics | 77.0 | **77.9** | +0.9 |
| College Physics | 57.8 | **58.8** | +1.0 |
| High School Physics | 60.9 | **62.9** | +2.0 |
| Astronomy | 80.3 | **80.9** | +0.7 |
| **Physics avg** | **69.0** | **70.1** | **+1.1** |

MMLU STEM aggregate: Base 68.3% → Fine-tuned 68.7% (+0.4 pp).

### Custom Physics Calculations (8 problems)

| Category | Base (%) | Fine-tuned (%) |
|----------|----------|----------------|
| Four-vectors | 50.0 | **50.0** |
| Invariant mass | 0.0 | 0.0 |
| Decay kinematics | 0.0 | 0.0 |
| Branching ratios | 0.0 | 0.0 |
| Kinematics (pT/η) | 0.0 | 0.0 |
| **Overall (exact match)** | **12.5** | **12.5** |

> **Note:** This custom benchmark covers only 8 problems and uses strict exact-match numeric scoring.
> Both models demonstrate correct reasoning in the response text but often fail the final
> answer-extraction step (e.g., outputting an intermediate value rather than the final result in the
> expected units). A lenient scoring pass would yield higher effective accuracy. The benchmark
> will be expanded in a future evaluation run.

### Benchmarks Not Yet Available

The following benchmarks encountered infrastructure errors during this evaluation run and will be
included in a future update:

| Benchmark | Intended Purpose | Blocker |
|-----------|-----------------|---------|
| SciQ | Science Q&A | HF dataset URI format incompatibility |
| GSM8K | Math reasoning | HF dataset URI format incompatibility |
| TruthfulQA mc1/mc2 | Hallucination resistance | HF dataset URI format incompatibility |
| HellaSwag | Commonsense forgetting check | HF dataset URI format incompatibility |
| IFEval | Instruction following | Missing `immutabledict` package |
| Minerva MATH | Advanced math | Missing `antlr4` package (LaTeX parsing) |
| BBQ | Bias evaluation | Task not registered in harness version |
| HEP-QA (held-out) | Domain Q&A | Evaluation module path error |

## Intended Use

This model is designed for:
- Answering questions about experimental and theoretical particle physics
- Explaining detector physics, collision analysis, and data analysis
- Solving quantitative physics problems (kinematics, cross-sections, decay calculations)
- Summarizing HEP papers and explaining their methodology

**Not intended for:**
- Real-time experimental analysis or ROOT file processing
- Safety-critical applications
- Medical or regulatory decisions

## Training Data

| Source | Volume | Description |
|--------|--------|-------------|
| arXiv hep-ph / hep-ex | ~10K papers → Q&A | Theory, phenomenology, experimental |
| INSPIRE-HEP | ~15K records | Paper summaries, detector data |
| CMS Open Data | ~5K examples | Collision analysis, ROOT metadata |
| PDG (Particle Data Group) | ~3K entries | Particle properties, decay modes |
| Synthetic Q&A | ~20K generated | Kinematics, formulas, calculations |

## Training Configuration

| Parameter | Value |
|-----------|-------|
| learning_rate | 8e-06 |
| num_epochs | 2 |
| batch_size (effective) | 32 |
| sequence_length | 4096 |
| optimizer | adamw_8bit |

## Usage

### Basic Generation

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# ChatML format (for Qwen base)
prompt = """<|im_start|>system
You are an expert particle physicist.<|im_end|>
<|im_start|>user
What is the invariant mass of two photons with energies 62.5 GeV each, traveling back-to-back?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

### Example 2

```bash 

# Install latest stable Transformers
!pip install -U transformers==5.5.0

# Install remaining deps
!pip install -U accelerate bitsandbytes sentencepiece protobuf peft trl

# Optional
!pip install -U unsloth
```

```python
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
)
import torch

model_name = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x"

# Quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

# Model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    dtype=torch.float16,
    trust_remote_code=True,
    quantization_config=bnb_config,
)


prompt = "Explain what a jet detector is in particle physics."

messages = [
    {"role": "user", "content": prompt}
]

# Apply chat template
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(
    text,
    return_tensors="pt"
).to(model.device)

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.5,
        do_sample=True,
        top_p=0.9,
    )

response = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True
)

print(response)
```


### vLLM Server (Recommended for Production)

```bash
# Install vLLM with ROCm support
pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm7.0

# Launch server
vllm serve rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --port 8000
```

```python
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
response = client.chat.completions.create(
    model="rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x",
    messages=[{"role": "user", "content": "Explain the CMS detector architecture."}],
    max_tokens=500,
)
print(response.choices[0].message.content)
```

## Limitations

- Knowledge cutoff reflects training data (primarily pre-2025 papers)
- May hallucinate specific numerical values; always verify against PDG/PDG Live
- Not trained for function-calling or tool-use tasks
- Quantitative calculations: correct reasoning approach observed but strict exact-match scores
  are low on small test sets; verify numerical outputs independently
- Limited coverage of very recent experimental results
- Several planned benchmarks (GSM8K, HellaSwag, TruthfulQA) could not run due to harness
  infrastructure issues; results will be added in a follow-up evaluation

## Citation

```bibtex
@misc{hep-agent-mi300x-2026,
  title        = {HEP-Agent: Full Fine-Tuning of Qwen/Qwen3.5-9B on High Energy Physics Data},
  author       = {Rathod, Rajveer},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x}},
  note         = {Fine-tuned on AMD MI300X (ROCm 7.0) using Unsloth acceleration}
}
```

## License

Apache License 2.0.

Base model weights are subject to their own license:
[Qwen/Qwen3.5-9B License](https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE)