gpt-oss-20b-paper-preference-150k-v1-dpo-lr5e-6-ep1-beta0-1-lora16a32-seq1024

trained with verl for paper-query citation chunk grounding.

  • base model: openai/gpt-oss-20b
  • dataset: paperbd/paper_preference_150K-v1
  • training hyperparams: dpo-lr5e-6-ep1-beta0.1-lora16a32-seq1024
  • local source folder: paperhound

the dataset contains positive cited chunks, not the full arxiv paper haystack, so this model is trained to emit known supporting chunks for a paper/query pair.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pradheep1647/gpt-oss-20b-paper-preference-150k-v1-dpo-lr5e-6-ep1-beta0-1-lora16a32-seq1024

Finetuned
(535)
this model

Dataset used to train Pradheep1647/gpt-oss-20b-paper-preference-150k-v1-dpo-lr5e-6-ep1-beta0-1-lora16a32-seq1024