rajveer43's picture
Update README.md
86bbcae verified
---
language: en
license: apache-2.0
tags:
- physics
- high-energy-physics
- hep
- particle-physics
- fine-tuned
- qwen3.5
- amd-mi300x
- rocm
base_model: Qwen/Qwen3.5-9B
datasets:
- arxiv-hep
- inspire-hep
- cms-open-data
- pdg-particle-data
pipeline_tag: text-generation
library_name: transformers
model_size: 9B
widget:
- text: "What is the invariant mass of two photons with energies 62.5 GeV each, traveling back-to-back?"
example_title: Invariant mass calculation
- text: "Explain the CMS detector architecture and its main subsystems."
example_title: Detector explanation
- text: "A Z boson decays at rest into an electron-positron pair. What is the electron momentum?"
example_title: Decay kinematics
model-index:
- name: hep-agent-qwen-qwen3-5-9b-mi300x
results:
- task:
type: text-generation
dataset:
name: MMLU
type: cais/mmlu
config: all
metrics:
- name: MMLU (5-shot)
type: acc
value: 70.6
verified: false
- task:
type: text-generation
dataset:
name: ARC Challenge
type: allenai/ai2_arc
config: ARC-Challenge
metrics:
- name: ARC-Challenge (25-shot, norm)
type: acc_norm
value: 71.8
verified: false
- task:
type: text-generation
dataset:
name: MMLU Conceptual Physics
type: cais/mmlu
config: conceptual_physics
metrics:
- name: MMLU Conceptual Physics (5-shot)
type: acc
value: 77.9
verified: false
- task:
type: text-generation
dataset:
name: MMLU College Physics
type: cais/mmlu
config: college_physics
metrics:
- name: MMLU College Physics (5-shot)
type: acc
value: 58.8
verified: false
- task:
type: text-generation
dataset:
name: MMLU High School Physics
type: cais/mmlu
config: high_school_physics
metrics:
- name: MMLU High School Physics (5-shot)
type: acc
value: 62.9
verified: false
- task:
type: text-generation
dataset:
name: MMLU Astronomy
type: cais/mmlu
config: astronomy
metrics:
- name: MMLU Astronomy (5-shot)
type: acc
value: 80.9
verified: false
---
# hep-agent-qwen-qwen3-5-9b-mi300x
> **HEP domain expert** — Fine-tuned Qwen/Qwen3.5-9B on High Energy Physics data.
This model is a full fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) on a curated
corpus of High Energy Physics literature, experimental data, and synthetic Q&A.
Trained on a single AMD MI300X (192 GB HBM3, ROCm 7.0).
## Model Overview
| Property | Value |
|----------|-------|
| Base model | `Qwen/Qwen3.5-9B` |
| Fine-tuning type | Full fine-tune (NOT LoRA) |
| Hardware | 1× AMD MI300X (192 GB HBM3, ROCm 7.0) |
| Precision | bfloat16 |
| Context length | 2048 tokens |
| Training data | ~50K–100K HEP examples |
| Optimizer | AdamW 8-bit (bitsandbytes) |
## Evaluation Results
All scores are accuracy (%) unless noted. Comparison against the unmodified `Qwen/Qwen3.5-9B` base.
### General Benchmarks
| Benchmark | Shots | Metric | Base (%) | Fine-tuned (%) | Δ |
|-----------|-------|--------|----------|----------------|---|
| MMLU Full | 5 | acc | 69.8 | **70.6** | +0.7 |
| ARC-Challenge | 25 | acc_norm | 71.1 | **71.8** | +0.7 |
No significant regressions were detected (threshold: −3 pp).
### MMLU Physics Subsets (extracted from MMLU Full run)
| Subset | Base (%) | Fine-tuned (%) | Δ |
|--------|----------|----------------|---|
| Conceptual Physics | 77.0 | **77.9** | +0.9 |
| College Physics | 57.8 | **58.8** | +1.0 |
| High School Physics | 60.9 | **62.9** | +2.0 |
| Astronomy | 80.3 | **80.9** | +0.7 |
| **Physics avg** | **69.0** | **70.1** | **+1.1** |
MMLU STEM aggregate: Base 68.3% → Fine-tuned 68.7% (+0.4 pp).
### Custom Physics Calculations (8 problems)
| Category | Base (%) | Fine-tuned (%) |
|----------|----------|----------------|
| Four-vectors | 50.0 | **50.0** |
| Invariant mass | 0.0 | 0.0 |
| Decay kinematics | 0.0 | 0.0 |
| Branching ratios | 0.0 | 0.0 |
| Kinematics (pT/η) | 0.0 | 0.0 |
| **Overall (exact match)** | **12.5** | **12.5** |
> **Note:** This custom benchmark covers only 8 problems and uses strict exact-match numeric scoring.
> Both models demonstrate correct reasoning in the response text but often fail the final
> answer-extraction step (e.g., outputting an intermediate value rather than the final result in the
> expected units). A lenient scoring pass would yield higher effective accuracy. The benchmark
> will be expanded in a future evaluation run.
### Benchmarks Not Yet Available
The following benchmarks encountered infrastructure errors during this evaluation run and will be
included in a future update:
| Benchmark | Intended Purpose | Blocker |
|-----------|-----------------|---------|
| SciQ | Science Q&A | HF dataset URI format incompatibility |
| GSM8K | Math reasoning | HF dataset URI format incompatibility |
| TruthfulQA mc1/mc2 | Hallucination resistance | HF dataset URI format incompatibility |
| HellaSwag | Commonsense forgetting check | HF dataset URI format incompatibility |
| IFEval | Instruction following | Missing `immutabledict` package |
| Minerva MATH | Advanced math | Missing `antlr4` package (LaTeX parsing) |
| BBQ | Bias evaluation | Task not registered in harness version |
| HEP-QA (held-out) | Domain Q&A | Evaluation module path error |
## Intended Use
This model is designed for:
- Answering questions about experimental and theoretical particle physics
- Explaining detector physics, collision analysis, and data analysis
- Solving quantitative physics problems (kinematics, cross-sections, decay calculations)
- Summarizing HEP papers and explaining their methodology
**Not intended for:**
- Real-time experimental analysis or ROOT file processing
- Safety-critical applications
- Medical or regulatory decisions
## Training Data
| Source | Volume | Description |
|--------|--------|-------------|
| arXiv hep-ph / hep-ex | ~10K papers → Q&A | Theory, phenomenology, experimental |
| INSPIRE-HEP | ~15K records | Paper summaries, detector data |
| CMS Open Data | ~5K examples | Collision analysis, ROOT metadata |
| PDG (Particle Data Group) | ~3K entries | Particle properties, decay modes |
| Synthetic Q&A | ~20K generated | Kinematics, formulas, calculations |
## Training Configuration
| Parameter | Value |
|-----------|-------|
| learning_rate | 8e-06 |
| num_epochs | 2 |
| batch_size (effective) | 32 |
| sequence_length | 4096 |
| optimizer | adamw_8bit |
## Usage
### Basic Generation
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
# ChatML format (for Qwen base)
prompt = """<|im_start|>system
You are an expert particle physicist.<|im_end|>
<|im_start|>user
What is the invariant mass of two photons with energies 62.5 GeV each, traveling back-to-back?<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
### Example 2
```bash
# Install latest stable Transformers
!pip install -U transformers==5.5.0
# Install remaining deps
!pip install -U accelerate bitsandbytes sentencepiece protobuf peft trl
# Optional
!pip install -U unsloth
```
```python
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig,
)
import torch
model_name = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x"
# Quantization config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True
)
# Model
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
dtype=torch.float16,
trust_remote_code=True,
quantization_config=bnb_config,
)
prompt = "Explain what a jet detector is in particle physics."
messages = [
{"role": "user", "content": prompt}
]
# Apply chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(
text,
return_tensors="pt"
).to(model.device)
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.5,
do_sample=True,
top_p=0.9,
)
response = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
print(response)
```
### vLLM Server (Recommended for Production)
```bash
# Install vLLM with ROCm support
pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm7.0
# Launch server
vllm serve rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x \
--dtype bfloat16 \
--max-model-len 4096 \
--port 8000
```
```python
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
response = client.chat.completions.create(
model="rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x",
messages=[{"role": "user", "content": "Explain the CMS detector architecture."}],
max_tokens=500,
)
print(response.choices[0].message.content)
```
## Limitations
- Knowledge cutoff reflects training data (primarily pre-2025 papers)
- May hallucinate specific numerical values; always verify against PDG/PDG Live
- Not trained for function-calling or tool-use tasks
- Quantitative calculations: correct reasoning approach observed but strict exact-match scores
are low on small test sets; verify numerical outputs independently
- Limited coverage of very recent experimental results
- Several planned benchmarks (GSM8K, HellaSwag, TruthfulQA) could not run due to harness
infrastructure issues; results will be added in a follow-up evaluation
## Citation
```bibtex
@misc{hep-agent-mi300x-2026,
title = {HEP-Agent: Full Fine-Tuning of Qwen/Qwen3.5-9B on High Energy Physics Data},
author = {Rathod, Rajveer},
year = {2026},
howpublished = {\url{https://huggingface.co/rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x}},
note = {Fine-tuned on AMD MI300X (ROCm 7.0) using Unsloth acceleration}
}
```
## License
Apache License 2.0.
Base model weights are subject to their own license:
[Qwen/Qwen3.5-9B License](https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE)