hep-agent-qwen-qwen3-5-9b-mi300x

HEP domain expert — Fine-tuned Qwen/Qwen3.5-9B on High Energy Physics data.

This model is a full fine-tune of Qwen/Qwen3.5-9B on a curated corpus of High Energy Physics literature, experimental data, and synthetic Q&A. Trained on a single AMD MI300X (192 GB HBM3, ROCm 7.0).

Model Overview

Property Value
Base model Qwen/Qwen3.5-9B
Fine-tuning type Full fine-tune (NOT LoRA)
Hardware 1× AMD MI300X (192 GB HBM3, ROCm 7.0)
Precision bfloat16
Context length 2048 tokens
Training data ~50K–100K HEP examples
Optimizer AdamW 8-bit (bitsandbytes)

Evaluation Results

All scores are accuracy (%) unless noted. Comparison against the unmodified Qwen/Qwen3.5-9B base.

General Benchmarks

Benchmark Shots Metric Base (%) Fine-tuned (%) Δ
MMLU Full 5 acc 69.8 70.6 +0.7
ARC-Challenge 25 acc_norm 71.1 71.8 +0.7

No significant regressions were detected (threshold: −3 pp).

MMLU Physics Subsets (extracted from MMLU Full run)

Subset Base (%) Fine-tuned (%) Δ
Conceptual Physics 77.0 77.9 +0.9
College Physics 57.8 58.8 +1.0
High School Physics 60.9 62.9 +2.0
Astronomy 80.3 80.9 +0.7
Physics avg 69.0 70.1 +1.1

MMLU STEM aggregate: Base 68.3% → Fine-tuned 68.7% (+0.4 pp).

Custom Physics Calculations (8 problems)

Category Base (%) Fine-tuned (%)
Four-vectors 50.0 50.0
Invariant mass 0.0 0.0
Decay kinematics 0.0 0.0
Branching ratios 0.0 0.0
Kinematics (pT/η) 0.0 0.0
Overall (exact match) 12.5 12.5

Note: This custom benchmark covers only 8 problems and uses strict exact-match numeric scoring. Both models demonstrate correct reasoning in the response text but often fail the final answer-extraction step (e.g., outputting an intermediate value rather than the final result in the expected units). A lenient scoring pass would yield higher effective accuracy. The benchmark will be expanded in a future evaluation run.

Benchmarks Not Yet Available

The following benchmarks encountered infrastructure errors during this evaluation run and will be included in a future update:

Benchmark Intended Purpose Blocker
SciQ Science Q&A HF dataset URI format incompatibility
GSM8K Math reasoning HF dataset URI format incompatibility
TruthfulQA mc1/mc2 Hallucination resistance HF dataset URI format incompatibility
HellaSwag Commonsense forgetting check HF dataset URI format incompatibility
IFEval Instruction following Missing immutabledict package
Minerva MATH Advanced math Missing antlr4 package (LaTeX parsing)
BBQ Bias evaluation Task not registered in harness version
HEP-QA (held-out) Domain Q&A Evaluation module path error

Intended Use

This model is designed for:

  • Answering questions about experimental and theoretical particle physics
  • Explaining detector physics, collision analysis, and data analysis
  • Solving quantitative physics problems (kinematics, cross-sections, decay calculations)
  • Summarizing HEP papers and explaining their methodology

Not intended for:

  • Real-time experimental analysis or ROOT file processing
  • Safety-critical applications
  • Medical or regulatory decisions

Training Data

Source Volume Description
arXiv hep-ph / hep-ex ~10K papers → Q&A Theory, phenomenology, experimental
INSPIRE-HEP ~15K records Paper summaries, detector data
CMS Open Data ~5K examples Collision analysis, ROOT metadata
PDG (Particle Data Group) ~3K entries Particle properties, decay modes
Synthetic Q&A ~20K generated Kinematics, formulas, calculations

Training Configuration

Parameter Value
learning_rate 8e-06
num_epochs 2
batch_size (effective) 32
sequence_length 4096
optimizer adamw_8bit

Usage

Basic Generation

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

# ChatML format (for Qwen base)
prompt = """<|im_start|>system
You are an expert particle physicist.<|im_end|>
<|im_start|>user
What is the invariant mass of two photons with energies 62.5 GeV each, traveling back-to-back?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Example 2


# Install latest stable Transformers
!pip install -U transformers==5.5.0

# Install remaining deps
!pip install -U accelerate bitsandbytes sentencepiece protobuf peft trl

# Optional
!pip install -U unsloth
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
)
import torch

model_name = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x"

# Quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

# Model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    dtype=torch.float16,
    trust_remote_code=True,
    quantization_config=bnb_config,
)


prompt = "Explain what a jet detector is in particle physics."

messages = [
    {"role": "user", "content": prompt}
]

# Apply chat template
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(
    text,
    return_tensors="pt"
).to(model.device)

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.5,
        do_sample=True,
        top_p=0.9,
    )

response = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True
)

print(response)

vLLM Server (Recommended for Production)

# Install vLLM with ROCm support
pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm7.0

# Launch server
vllm serve rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --port 8000
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
response = client.chat.completions.create(
    model="rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x",
    messages=[{"role": "user", "content": "Explain the CMS detector architecture."}],
    max_tokens=500,
)
print(response.choices[0].message.content)

Limitations

  • Knowledge cutoff reflects training data (primarily pre-2025 papers)
  • May hallucinate specific numerical values; always verify against PDG/PDG Live
  • Not trained for function-calling or tool-use tasks
  • Quantitative calculations: correct reasoning approach observed but strict exact-match scores are low on small test sets; verify numerical outputs independently
  • Limited coverage of very recent experimental results
  • Several planned benchmarks (GSM8K, HellaSwag, TruthfulQA) could not run due to harness infrastructure issues; results will be added in a follow-up evaluation

Citation

@misc{hep-agent-mi300x-2026,
  title        = {HEP-Agent: Full Fine-Tuning of Qwen/Qwen3.5-9B on High Energy Physics Data},
  author       = {Rathod, Rajveer},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x}},
  note         = {Fine-tuned on AMD MI300X (ROCm 7.0) using Unsloth acceleration}
}

License

Apache License 2.0.

Base model weights are subject to their own license: Qwen/Qwen3.5-9B License

Downloads last month
102
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(373)
this model

Evaluation results

  • MMLU (5-shot) on MMLU
    self-reported
    70.600
  • ARC-Challenge (25-shot, norm) on ARC Challenge
    self-reported
    71.800
  • MMLU Conceptual Physics (5-shot) on MMLU Conceptual Physics
    self-reported
    77.900
  • MMLU College Physics (5-shot) on MMLU College Physics
    self-reported
    58.800
  • MMLU High School Physics (5-shot) on MMLU High School Physics
    self-reported
    62.900
  • MMLU Astronomy (5-shot) on MMLU Astronomy
    self-reported
    80.900