--- language: en license: apache-2.0 tags: - physics - high-energy-physics - hep - particle-physics - fine-tuned - qwen3.5 - amd-mi300x - rocm base_model: Qwen/Qwen3.5-9B datasets: - arxiv-hep - inspire-hep - cms-open-data - pdg-particle-data pipeline_tag: text-generation library_name: transformers model_size: 9B widget: - text: "What is the invariant mass of two photons with energies 62.5 GeV each, traveling back-to-back?" example_title: Invariant mass calculation - text: "Explain the CMS detector architecture and its main subsystems." example_title: Detector explanation - text: "A Z boson decays at rest into an electron-positron pair. What is the electron momentum?" example_title: Decay kinematics model-index: - name: hep-agent-qwen-qwen3-5-9b-mi300x results: - task: type: text-generation dataset: name: MMLU type: cais/mmlu config: all metrics: - name: MMLU (5-shot) type: acc value: 70.6 verified: false - task: type: text-generation dataset: name: ARC Challenge type: allenai/ai2_arc config: ARC-Challenge metrics: - name: ARC-Challenge (25-shot, norm) type: acc_norm value: 71.8 verified: false - task: type: text-generation dataset: name: MMLU Conceptual Physics type: cais/mmlu config: conceptual_physics metrics: - name: MMLU Conceptual Physics (5-shot) type: acc value: 77.9 verified: false - task: type: text-generation dataset: name: MMLU College Physics type: cais/mmlu config: college_physics metrics: - name: MMLU College Physics (5-shot) type: acc value: 58.8 verified: false - task: type: text-generation dataset: name: MMLU High School Physics type: cais/mmlu config: high_school_physics metrics: - name: MMLU High School Physics (5-shot) type: acc value: 62.9 verified: false - task: type: text-generation dataset: name: MMLU Astronomy type: cais/mmlu config: astronomy metrics: - name: MMLU Astronomy (5-shot) type: acc value: 80.9 verified: false --- # hep-agent-qwen-qwen3-5-9b-mi300x > **HEP domain expert** — Fine-tuned Qwen/Qwen3.5-9B on High Energy Physics data. This model is a full fine-tune of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) on a curated corpus of High Energy Physics literature, experimental data, and synthetic Q&A. Trained on a single AMD MI300X (192 GB HBM3, ROCm 7.0). ## Model Overview | Property | Value | |----------|-------| | Base model | `Qwen/Qwen3.5-9B` | | Fine-tuning type | Full fine-tune (NOT LoRA) | | Hardware | 1× AMD MI300X (192 GB HBM3, ROCm 7.0) | | Precision | bfloat16 | | Context length | 2048 tokens | | Training data | ~50K–100K HEP examples | | Optimizer | AdamW 8-bit (bitsandbytes) | ## Evaluation Results All scores are accuracy (%) unless noted. Comparison against the unmodified `Qwen/Qwen3.5-9B` base. ### General Benchmarks | Benchmark | Shots | Metric | Base (%) | Fine-tuned (%) | Δ | |-----------|-------|--------|----------|----------------|---| | MMLU Full | 5 | acc | 69.8 | **70.6** | +0.7 | | ARC-Challenge | 25 | acc_norm | 71.1 | **71.8** | +0.7 | No significant regressions were detected (threshold: −3 pp). ### MMLU Physics Subsets (extracted from MMLU Full run) | Subset | Base (%) | Fine-tuned (%) | Δ | |--------|----------|----------------|---| | Conceptual Physics | 77.0 | **77.9** | +0.9 | | College Physics | 57.8 | **58.8** | +1.0 | | High School Physics | 60.9 | **62.9** | +2.0 | | Astronomy | 80.3 | **80.9** | +0.7 | | **Physics avg** | **69.0** | **70.1** | **+1.1** | MMLU STEM aggregate: Base 68.3% → Fine-tuned 68.7% (+0.4 pp). ### Custom Physics Calculations (8 problems) | Category | Base (%) | Fine-tuned (%) | |----------|----------|----------------| | Four-vectors | 50.0 | **50.0** | | Invariant mass | 0.0 | 0.0 | | Decay kinematics | 0.0 | 0.0 | | Branching ratios | 0.0 | 0.0 | | Kinematics (pT/η) | 0.0 | 0.0 | | **Overall (exact match)** | **12.5** | **12.5** | > **Note:** This custom benchmark covers only 8 problems and uses strict exact-match numeric scoring. > Both models demonstrate correct reasoning in the response text but often fail the final > answer-extraction step (e.g., outputting an intermediate value rather than the final result in the > expected units). A lenient scoring pass would yield higher effective accuracy. The benchmark > will be expanded in a future evaluation run. ### Benchmarks Not Yet Available The following benchmarks encountered infrastructure errors during this evaluation run and will be included in a future update: | Benchmark | Intended Purpose | Blocker | |-----------|-----------------|---------| | SciQ | Science Q&A | HF dataset URI format incompatibility | | GSM8K | Math reasoning | HF dataset URI format incompatibility | | TruthfulQA mc1/mc2 | Hallucination resistance | HF dataset URI format incompatibility | | HellaSwag | Commonsense forgetting check | HF dataset URI format incompatibility | | IFEval | Instruction following | Missing `immutabledict` package | | Minerva MATH | Advanced math | Missing `antlr4` package (LaTeX parsing) | | BBQ | Bias evaluation | Task not registered in harness version | | HEP-QA (held-out) | Domain Q&A | Evaluation module path error | ## Intended Use This model is designed for: - Answering questions about experimental and theoretical particle physics - Explaining detector physics, collision analysis, and data analysis - Solving quantitative physics problems (kinematics, cross-sections, decay calculations) - Summarizing HEP papers and explaining their methodology **Not intended for:** - Real-time experimental analysis or ROOT file processing - Safety-critical applications - Medical or regulatory decisions ## Training Data | Source | Volume | Description | |--------|--------|-------------| | arXiv hep-ph / hep-ex | ~10K papers → Q&A | Theory, phenomenology, experimental | | INSPIRE-HEP | ~15K records | Paper summaries, detector data | | CMS Open Data | ~5K examples | Collision analysis, ROOT metadata | | PDG (Particle Data Group) | ~3K entries | Particle properties, decay modes | | Synthetic Q&A | ~20K generated | Kinematics, formulas, calculations | ## Training Configuration | Parameter | Value | |-----------|-------| | learning_rate | 8e-06 | | num_epochs | 2 | | batch_size (effective) | 32 | | sequence_length | 4096 | | optimizer | adamw_8bit | ## Usage ### Basic Generation ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) # ChatML format (for Qwen base) prompt = """<|im_start|>system You are an expert particle physicist.<|im_end|> <|im_start|>user What is the invariant mass of two photons with energies 62.5 GeV each, traveling back-to-back?<|im_end|> <|im_start|>assistant """ inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate(**inputs, max_new_tokens=300, do_sample=False) print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` ### Example 2 ```bash # Install latest stable Transformers !pip install -U transformers==5.5.0 # Install remaining deps !pip install -U accelerate bitsandbytes sentencepiece protobuf peft trl # Optional !pip install -U unsloth ``` ```python from transformers import ( AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, ) import torch model_name = "rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x" # Quantization config bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, ) # Tokenizer tokenizer = AutoTokenizer.from_pretrained( model_name, trust_remote_code=True ) # Model model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", dtype=torch.float16, trust_remote_code=True, quantization_config=bnb_config, ) prompt = "Explain what a jet detector is in particle physics." messages = [ {"role": "user", "content": prompt} ] # Apply chat template text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer( text, return_tensors="pt" ).to(model.device) # Generate with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=2048, temperature=0.5, do_sample=True, top_p=0.9, ) response = tokenizer.decode( outputs[0], skip_special_tokens=True ) print(response) ``` ### vLLM Server (Recommended for Production) ```bash # Install vLLM with ROCm support pip install vllm --extra-index-url https://download.pytorch.org/whl/rocm7.0 # Launch server vllm serve rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x \ --dtype bfloat16 \ --max-model-len 4096 \ --port 8000 ``` ```python from openai import OpenAI client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1") response = client.chat.completions.create( model="rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x", messages=[{"role": "user", "content": "Explain the CMS detector architecture."}], max_tokens=500, ) print(response.choices[0].message.content) ``` ## Limitations - Knowledge cutoff reflects training data (primarily pre-2025 papers) - May hallucinate specific numerical values; always verify against PDG/PDG Live - Not trained for function-calling or tool-use tasks - Quantitative calculations: correct reasoning approach observed but strict exact-match scores are low on small test sets; verify numerical outputs independently - Limited coverage of very recent experimental results - Several planned benchmarks (GSM8K, HellaSwag, TruthfulQA) could not run due to harness infrastructure issues; results will be added in a follow-up evaluation ## Citation ```bibtex @misc{hep-agent-mi300x-2026, title = {HEP-Agent: Full Fine-Tuning of Qwen/Qwen3.5-9B on High Energy Physics Data}, author = {Rathod, Rajveer}, year = {2026}, howpublished = {\url{https://huggingface.co/rajveer43/hep-agent-qwen-qwen3-5-9b-mi300x}}, note = {Fine-tuned on AMD MI300X (ROCm 7.0) using Unsloth acceleration} } ``` ## License Apache License 2.0. Base model weights are subject to their own license: [Qwen/Qwen3.5-9B License](https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE)