Funding Extraction LoRA (Qwen3.5-9B)

LoRA adapter for extracting structured funding metadata (funder names + award IDs) from academic paper funding statements. Fine-tuned on Qwen3.5-9B via SFT then GRPO reinforcement learning.

This is the Qwen3.5-9B counterpart to cometadata/funding-extraction-llama-3.1-8b-instruct-artifact-data-mix-grpo-mixed-reward, trained with the same data, pipeline, and reward. See Comparison to the Llama 3.1 8B baseline below.

Training Pipeline

Trained on the cometadata/funding-extraction-artifact-data-mix-grpo-mixed-reward dataset using its pre-split sft / rl / test separations on the Tinker training service.

Stage 1: Supervised Fine-Tuning (SFT)

  • Base model: Qwen/Qwen3.5-9B
  • Data (data/sft/): 3,528 real + 7,240 synthetic funding statements with gold-standard funder/award labels (synthetic upsampled 2×)
  • Data augmentation: 50% of training examples augmented with synthetic noise (OCR-like case errors, digit/letter swaps, Unicode artifacts, XML/HTML tags, LaTeX markup)
  • Renderer: qwen3_5_disable_thinking (no chain-of-thought; keep thinking disabled at inference, see Usage)
  • LoRA rank: 128
  • Epochs: 2
  • Result: eval NLL 0.116 → 0.0035 over 252 steps

Stage 2: Reinforcement Learning (GRPO)

  • Algorithm: Group Relative Policy Optimization (GRPO) with importance sampling loss
  • Data (data/rl/): 1,160 real + 1,916 synthetic (train); 576 real + 968 synthetic (eval)
  • Reward: Hierarchical F0.5 scoring with binary funder/award-ID matching + flat award-ID association bonus
    • reward = 0.50 * funder_F0.5 + 0.40 * hierarchical_award_id_F0.5 + 0.10 * flat_award_id_F0.5
    • Funder matching — fuzzy (token_sort_ratio ≥ 0.80 threshold, Hungarian optimal assignment)
    • Award ID matching — binary exact after normalization (strip whitespace/hyphens/slashes, uppercase), with soft (edit-distance-1) partial credit during training
    • Flat award-ID term — awards partial credit when the correct award ID is extracted under the wrong funder, providing gradient on funder-award association errors
  • KL penalty: 0.03 (anchored to SFT checkpoint)
  • Group size: 8 rollouts per prompt
  • Temperature: 0.8
  • Learning rate: 3e-5
  • Steps: 193 batches
  • Checkpoint: final (batch 193)

Evaluation Results

arxiv_test.jsonl (300 held-out examples)

Permissive (partial_ratio + token_set, no damping)

Field P R F1 F0.5 F1.5
Funder 0.9384 0.9362 0.9373 0.9379 0.9369
Award ID 0.9069 0.8909 0.8988 0.9037 0.8957
Scheme 0.7407 0.8264 0.7812 0.7564 0.7980
Title 0.9048 0.3958 0.5507 0.7197 0.4787

Balanced (length-damped + acronym detection)

Field P R F1 F0.5 F1.5
Funder 0.8882 0.8960 0.8921 0.8897 0.8936
Award ID 0.8889 0.8732 0.8810 0.8857 0.8779
Scheme 0.6889 0.7686 0.7266 0.7035 0.7422
Title 0.9048 0.3958 0.5507 0.7197 0.4787

Strict (token_sort_ratio only)

Field P R F1 F0.5 F1.5
Funder 0.8796 0.8874 0.8835 0.8812 0.8850
Award ID 0.8859 0.8702 0.8780 0.8827 0.8750
Scheme 0.6667 0.7438 0.7031 0.6808 0.7182
Title 0.8095 0.3542 0.4928 0.6439 0.4283

All 300 outputs were valid JSON.

funding-entity-extraction-dataset-mix test sets

Evaluated on the held-out test sets from cometadata/funding-entity-extraction-dataset-mix with the same evaluation harness. test_with_context uses the full_text field (the funding statement with its surrounding document text) as the model input.

test.jsonl (347 examples)

Permissive (partial_ratio + token_set, no damping)

Field P R F1 F0.5 F1.5
Funder 0.9376 0.8923 0.9144 0.9282 0.9058
Award ID 0.8407 0.8339 0.8373 0.8394 0.8360
Scheme 0.4118 0.5927 0.4860 0.4385 0.5221
Title 0.1034 0.0170 0.0293 0.0514 0.0229

Balanced (length-damped + acronym detection)

Field P R F1 F0.5 F1.5
Funder 0.9008 0.8555 0.8776 0.8913 0.8689
Award ID 0.8138 0.8072 0.8105 0.8125 0.8092
Scheme 0.3725 0.5363 0.4397 0.3968 0.4724
Title 0.0690 0.0114 0.0195 0.0342 0.0153

Strict (token_sort_ratio only)

Field P R F1 F0.5 F1.5
Funder 0.8722 0.8276 0.8493 0.8629 0.8408
Award ID 0.7963 0.7898 0.7930 0.7949 0.7918
Scheme 0.3333 0.4798 0.3934 0.3550 0.4227
Title 0.0690 0.0114 0.0195 0.0342 0.0153

test_degraded.jsonl (1,288 examples)

Permissive (partial_ratio + token_set, no damping)

Field P R F1 F0.5 F1.5
Funder 0.9285 0.9216 0.9250 0.9271 0.9237
Award ID 0.8586 0.8560 0.8573 0.8581 0.8568
Scheme 0.7413 0.6704 0.7041 0.7260 0.6907
Title 0.7723 0.2267 0.3506 0.5214 0.2897

Balanced (length-damped + acronym detection)

Field P R F1 F0.5 F1.5
Funder 0.9001 0.8906 0.8953 0.8981 0.8935
Award ID 0.8416 0.8390 0.8403 0.8411 0.8398
Scheme 0.6757 0.6110 0.6417 0.6617 0.6296
Title 0.6634 0.1948 0.3011 0.4479 0.2489

Strict (token_sort_ratio only)

Field P R F1 F0.5 F1.5
Funder 0.8801 0.8690 0.8745 0.8778 0.8724
Award ID 0.8317 0.8291 0.8304 0.8312 0.8299
Scheme 0.6039 0.5461 0.5735 0.5913 0.5627
Title 0.6139 0.1802 0.2787 0.4144 0.2303

test_with_context.jsonl (322 examples)

Permissive (partial_ratio + token_set, no damping)

Field P R F1 F0.5 F1.5
Funder 0.9348 0.9383 0.9365 0.9355 0.9372
Award ID 0.8711 0.8690 0.8700 0.8707 0.8696
Scheme 0.7515 0.6844 0.7164 0.7371 0.7037
Title 0.8750 0.2442 0.3818 0.5769 0.3138

Balanced (length-damped + acronym detection)

Field P R F1 F0.5 F1.5
Funder 0.9072 0.9061 0.9066 0.9070 0.9064
Award ID 0.8538 0.8517 0.8527 0.8534 0.8523
Scheme 0.6871 0.6257 0.6550 0.6739 0.6434
Title 0.7500 0.2093 0.3273 0.4945 0.2690

Strict (token_sort_ratio only)

Field P R F1 F0.5 F1.5
Funder 0.8863 0.8842 0.8852 0.8859 0.8848
Award ID 0.8439 0.8418 0.8428 0.8434 0.8424
Scheme 0.6074 0.5531 0.5789 0.5957 0.5687
Title 0.7083 0.1977 0.3091 0.4670 0.2540

Comparison to the Llama 3.1 8B baseline

Balanced-mode F1 on the two test sets reported by the Llama baseline card:

arxiv_test (300 examples)

Field Llama 3.1 8B Qwen3.5-9B Δ
Funder 0.9001 0.8921 −0.008
Award ID 0.8780 0.8810 +0.003
Scheme 0.6466 0.7266 +0.080
Title 0.5316 0.5507 +0.019

test_degraded (1,288 examples)

Field Llama 3.1 8B Qwen3.5-9B Δ
Funder 0.8999 0.8953 −0.005
Award ID 0.8477 0.8403 −0.007
Scheme 0.6370 0.6417 +0.005
Title 0.4110 0.3011 −0.110

Funder and award ID (the reward-weighted fields) are within 0.008 F1 of the Llama baseline on both sets. Scheme and title carry zero reward weight.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base_model, "cometadata/funding-extraction-qwen3.5-9B-artifact-data-mix-grpo-mixed-reward")
tokenizer = AutoTokenizer.from_pretrained("cometadata/funding-extraction-qwen3.5-9B-artifact-data-mix-grpo-mixed-reward")

prompt = """Extract funding information from the following statement:

This work was supported by the National Science Foundation under grant DMS-1613002 and by the NIH (R01-AI123456)."""

messages = [
    {"role": "system", "content": "You are an expert at extracting structured funding metadata from academic papers. Given a funding statement, extract all funders and their associated awards. Return a JSON array of funder objects. Each funder has:\n- \"funder_name\": string or null\n- \"awards\": array of objects with \"award_ids\" (array of strings), \"funding_scheme\" (array of strings), and \"award_title\" (array of strings)\nReturn ONLY the JSON array, no other text."},
    {"role": "user", "content": prompt},
]

# Model trained with thinking disabled; keep enable_thinking=False.
inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True, enable_thinking=False
)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Output Format

[
  {
    "funder_name": "National Science Foundation",
    "awards": [
      {
        "award_ids": ["DMS-1613002"],
        "funding_scheme": [],
        "award_title": []
      }
    ]
  },
  {
    "funder_name": "NIH",
    "awards": [
      {
        "award_ids": ["R01-AI123456"],
        "funding_scheme": [],
        "award_title": []
      }
    ]
  }
]
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward

Finetuned
Qwen/Qwen3.5-9B
Adapter
(384)
this model

Dataset used to train cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward