Instructions to use cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B") model = PeftModel.from_pretrained(base_model, "cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward") - Notebooks
- Google Colab
- Kaggle
Funding Extraction LoRA (Qwen3.5-9B)
LoRA adapter for extracting structured funding metadata (funder names + award IDs) from academic paper funding statements. Fine-tuned on Qwen3.5-9B via SFT then GRPO reinforcement learning.
This is the Qwen3.5-9B counterpart to cometadata/funding-extraction-llama-3.1-8b-instruct-artifact-data-mix-grpo-mixed-reward, trained with the same data, pipeline, and reward. See Comparison to the Llama 3.1 8B baseline below.
Training Pipeline
Trained on the cometadata/funding-extraction-artifact-data-mix-grpo-mixed-reward dataset using its pre-split sft / rl / test separations on the Tinker training service.
Stage 1: Supervised Fine-Tuning (SFT)
- Base model:
Qwen/Qwen3.5-9B - Data (
data/sft/): 3,528 real + 7,240 synthetic funding statements with gold-standard funder/award labels (synthetic upsampled 2×) - Data augmentation: 50% of training examples augmented with synthetic noise (OCR-like case errors, digit/letter swaps, Unicode artifacts, XML/HTML tags, LaTeX markup)
- Renderer:
qwen3_5_disable_thinking(no chain-of-thought; keep thinking disabled at inference, see Usage) - LoRA rank: 128
- Epochs: 2
- Result: eval NLL 0.116 → 0.0035 over 252 steps
Stage 2: Reinforcement Learning (GRPO)
- Algorithm: Group Relative Policy Optimization (GRPO) with importance sampling loss
- Data (
data/rl/): 1,160 real + 1,916 synthetic (train); 576 real + 968 synthetic (eval) - Reward: Hierarchical F0.5 scoring with binary funder/award-ID matching + flat award-ID association bonus
reward = 0.50 * funder_F0.5 + 0.40 * hierarchical_award_id_F0.5 + 0.10 * flat_award_id_F0.5- Funder matching — fuzzy (token_sort_ratio ≥ 0.80 threshold, Hungarian optimal assignment)
- Award ID matching — binary exact after normalization (strip whitespace/hyphens/slashes, uppercase), with soft (edit-distance-1) partial credit during training
- Flat award-ID term — awards partial credit when the correct award ID is extracted under the wrong funder, providing gradient on funder-award association errors
- KL penalty: 0.03 (anchored to SFT checkpoint)
- Group size: 8 rollouts per prompt
- Temperature: 0.8
- Learning rate: 3e-5
- Steps: 193 batches
- Checkpoint: final (batch 193)
Evaluation Results
arxiv_test.jsonl (300 held-out examples)
Permissive (partial_ratio + token_set, no damping)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.9384 | 0.9362 | 0.9373 | 0.9379 | 0.9369 |
| Award ID | 0.9069 | 0.8909 | 0.8988 | 0.9037 | 0.8957 |
| Scheme | 0.7407 | 0.8264 | 0.7812 | 0.7564 | 0.7980 |
| Title | 0.9048 | 0.3958 | 0.5507 | 0.7197 | 0.4787 |
Balanced (length-damped + acronym detection)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.8882 | 0.8960 | 0.8921 | 0.8897 | 0.8936 |
| Award ID | 0.8889 | 0.8732 | 0.8810 | 0.8857 | 0.8779 |
| Scheme | 0.6889 | 0.7686 | 0.7266 | 0.7035 | 0.7422 |
| Title | 0.9048 | 0.3958 | 0.5507 | 0.7197 | 0.4787 |
Strict (token_sort_ratio only)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.8796 | 0.8874 | 0.8835 | 0.8812 | 0.8850 |
| Award ID | 0.8859 | 0.8702 | 0.8780 | 0.8827 | 0.8750 |
| Scheme | 0.6667 | 0.7438 | 0.7031 | 0.6808 | 0.7182 |
| Title | 0.8095 | 0.3542 | 0.4928 | 0.6439 | 0.4283 |
All 300 outputs were valid JSON.
funding-entity-extraction-dataset-mix test sets
Evaluated on the held-out test sets from cometadata/funding-entity-extraction-dataset-mix with the same evaluation harness. test_with_context uses the full_text field (the funding statement with its surrounding document text) as the model input.
test.jsonl (347 examples)
Permissive (partial_ratio + token_set, no damping)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.9376 | 0.8923 | 0.9144 | 0.9282 | 0.9058 |
| Award ID | 0.8407 | 0.8339 | 0.8373 | 0.8394 | 0.8360 |
| Scheme | 0.4118 | 0.5927 | 0.4860 | 0.4385 | 0.5221 |
| Title | 0.1034 | 0.0170 | 0.0293 | 0.0514 | 0.0229 |
Balanced (length-damped + acronym detection)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.9008 | 0.8555 | 0.8776 | 0.8913 | 0.8689 |
| Award ID | 0.8138 | 0.8072 | 0.8105 | 0.8125 | 0.8092 |
| Scheme | 0.3725 | 0.5363 | 0.4397 | 0.3968 | 0.4724 |
| Title | 0.0690 | 0.0114 | 0.0195 | 0.0342 | 0.0153 |
Strict (token_sort_ratio only)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.8722 | 0.8276 | 0.8493 | 0.8629 | 0.8408 |
| Award ID | 0.7963 | 0.7898 | 0.7930 | 0.7949 | 0.7918 |
| Scheme | 0.3333 | 0.4798 | 0.3934 | 0.3550 | 0.4227 |
| Title | 0.0690 | 0.0114 | 0.0195 | 0.0342 | 0.0153 |
test_degraded.jsonl (1,288 examples)
Permissive (partial_ratio + token_set, no damping)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.9285 | 0.9216 | 0.9250 | 0.9271 | 0.9237 |
| Award ID | 0.8586 | 0.8560 | 0.8573 | 0.8581 | 0.8568 |
| Scheme | 0.7413 | 0.6704 | 0.7041 | 0.7260 | 0.6907 |
| Title | 0.7723 | 0.2267 | 0.3506 | 0.5214 | 0.2897 |
Balanced (length-damped + acronym detection)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.9001 | 0.8906 | 0.8953 | 0.8981 | 0.8935 |
| Award ID | 0.8416 | 0.8390 | 0.8403 | 0.8411 | 0.8398 |
| Scheme | 0.6757 | 0.6110 | 0.6417 | 0.6617 | 0.6296 |
| Title | 0.6634 | 0.1948 | 0.3011 | 0.4479 | 0.2489 |
Strict (token_sort_ratio only)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.8801 | 0.8690 | 0.8745 | 0.8778 | 0.8724 |
| Award ID | 0.8317 | 0.8291 | 0.8304 | 0.8312 | 0.8299 |
| Scheme | 0.6039 | 0.5461 | 0.5735 | 0.5913 | 0.5627 |
| Title | 0.6139 | 0.1802 | 0.2787 | 0.4144 | 0.2303 |
test_with_context.jsonl (322 examples)
Permissive (partial_ratio + token_set, no damping)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.9348 | 0.9383 | 0.9365 | 0.9355 | 0.9372 |
| Award ID | 0.8711 | 0.8690 | 0.8700 | 0.8707 | 0.8696 |
| Scheme | 0.7515 | 0.6844 | 0.7164 | 0.7371 | 0.7037 |
| Title | 0.8750 | 0.2442 | 0.3818 | 0.5769 | 0.3138 |
Balanced (length-damped + acronym detection)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.9072 | 0.9061 | 0.9066 | 0.9070 | 0.9064 |
| Award ID | 0.8538 | 0.8517 | 0.8527 | 0.8534 | 0.8523 |
| Scheme | 0.6871 | 0.6257 | 0.6550 | 0.6739 | 0.6434 |
| Title | 0.7500 | 0.2093 | 0.3273 | 0.4945 | 0.2690 |
Strict (token_sort_ratio only)
| Field | P | R | F1 | F0.5 | F1.5 |
|---|---|---|---|---|---|
| Funder | 0.8863 | 0.8842 | 0.8852 | 0.8859 | 0.8848 |
| Award ID | 0.8439 | 0.8418 | 0.8428 | 0.8434 | 0.8424 |
| Scheme | 0.6074 | 0.5531 | 0.5789 | 0.5957 | 0.5687 |
| Title | 0.7083 | 0.1977 | 0.3091 | 0.4670 | 0.2540 |
Comparison to the Llama 3.1 8B baseline
Balanced-mode F1 on the two test sets reported by the Llama baseline card:
arxiv_test (300 examples)
| Field | Llama 3.1 8B | Qwen3.5-9B | Δ |
|---|---|---|---|
| Funder | 0.9001 | 0.8921 | −0.008 |
| Award ID | 0.8780 | 0.8810 | +0.003 |
| Scheme | 0.6466 | 0.7266 | +0.080 |
| Title | 0.5316 | 0.5507 | +0.019 |
test_degraded (1,288 examples)
| Field | Llama 3.1 8B | Qwen3.5-9B | Δ |
|---|---|---|---|
| Funder | 0.8999 | 0.8953 | −0.005 |
| Award ID | 0.8477 | 0.8403 | −0.007 |
| Scheme | 0.6370 | 0.6417 | +0.005 |
| Title | 0.4110 | 0.3011 | −0.110 |
Funder and award ID (the reward-weighted fields) are within 0.008 F1 of the Llama baseline on both sets. Scheme and title carry zero reward weight.
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base_model, "cometadata/funding-extraction-qwen3.5-9B-artifact-data-mix-grpo-mixed-reward")
tokenizer = AutoTokenizer.from_pretrained("cometadata/funding-extraction-qwen3.5-9B-artifact-data-mix-grpo-mixed-reward")
prompt = """Extract funding information from the following statement:
This work was supported by the National Science Foundation under grant DMS-1613002 and by the NIH (R01-AI123456)."""
messages = [
{"role": "system", "content": "You are an expert at extracting structured funding metadata from academic papers. Given a funding statement, extract all funders and their associated awards. Return a JSON array of funder objects. Each funder has:\n- \"funder_name\": string or null\n- \"awards\": array of objects with \"award_ids\" (array of strings), \"funding_scheme\" (array of strings), and \"award_title\" (array of strings)\nReturn ONLY the JSON array, no other text."},
{"role": "user", "content": prompt},
]
# Model trained with thinking disabled; keep enable_thinking=False.
inputs = tokenizer.apply_chat_template(
messages, return_tensors="pt", add_generation_prompt=True, enable_thinking=False
)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
Output Format
[
{
"funder_name": "National Science Foundation",
"awards": [
{
"award_ids": ["DMS-1613002"],
"funding_scheme": [],
"award_title": []
}
]
},
{
"funder_name": "NIH",
"awards": [
{
"award_ids": ["R01-AI123456"],
"funding_scheme": [],
"award_title": []
}
]
}
]
- Downloads last month
- 3