---
base_model: Qwen/Qwen3.5-9B
library_name: peft
license: apache-2.0
datasets:
  - cometadata/funding-extraction-artifact-data-mix-grpo-mixed-reward
tags:
  - funding-extraction
  - lora
  - grpo
  - rl
  - scholarly-metadata
language:
  - en
pipeline_tag: text-generation
---

# Funding Extraction LoRA (Qwen3.5-9B)

LoRA adapter for extracting structured funding metadata (funder names + award IDs) from academic paper funding statements. Fine-tuned on Qwen3.5-9B via SFT then GRPO reinforcement learning.

This is the Qwen3.5-9B counterpart to [`cometadata/funding-extraction-llama-3.1-8b-instruct-artifact-data-mix-grpo-mixed-reward`](https://huggingface.co/cometadata/funding-extraction-llama-3.1-8b-instruct-artifact-data-mix-grpo-mixed-reward), trained with the same data, pipeline, and reward. See [Comparison to the Llama 3.1 8B baseline](#comparison-to-the-llama-31-8b-baseline) below.

## Training Pipeline

Trained on the [`cometadata/funding-extraction-artifact-data-mix-grpo-mixed-reward`](https://huggingface.co/datasets/cometadata/funding-extraction-artifact-data-mix-grpo-mixed-reward) dataset using its pre-split `sft` / `rl` / `test` separations on the [Tinker](https://thinkingmachines.ai) training service.

### Stage 1: Supervised Fine-Tuning (SFT)

- **Base model:** `Qwen/Qwen3.5-9B`
- **Data (`data/sft/`):** 3,528 real + 7,240 synthetic funding statements with gold-standard funder/award labels (synthetic upsampled 2×)
- **Data augmentation:** 50% of training examples augmented with synthetic noise (OCR-like case errors, digit/letter swaps, Unicode artifacts, XML/HTML tags, LaTeX markup)
- **Renderer:** `qwen3_5_disable_thinking` (no chain-of-thought; keep thinking disabled at inference, see [Usage](#usage))
- **LoRA rank:** 128
- **Epochs:** 2
- **Result:** eval NLL 0.116 → 0.0035 over 252 steps

### Stage 2: Reinforcement Learning (GRPO)

- **Algorithm:** Group Relative Policy Optimization (GRPO) with importance sampling loss
- **Data (`data/rl/`):** 1,160 real + 1,916 synthetic (train); 576 real + 968 synthetic (eval)
- **Reward:** Hierarchical F0.5 scoring with binary funder/award-ID matching + flat award-ID association bonus
  - `reward = 0.50 * funder_F0.5 + 0.40 * hierarchical_award_id_F0.5 + 0.10 * flat_award_id_F0.5`
  - Funder matching — fuzzy (token_sort_ratio ≥ 0.80 threshold, Hungarian optimal assignment)
  - Award ID matching — binary exact after normalization (strip whitespace/hyphens/slashes, uppercase), with soft (edit-distance-1) partial credit during training
  - Flat award-ID term — awards partial credit when the correct award ID is extracted under the wrong funder, providing gradient on funder-award association errors
- **KL penalty:** 0.03 (anchored to SFT checkpoint)
- **Group size:** 8 rollouts per prompt
- **Temperature:** 0.8
- **Learning rate:** 3e-5
- **Steps:** 193 batches
- **Checkpoint:** final (batch 193)

## Evaluation Results

### arxiv_test.jsonl (300 held-out examples)

#### Permissive (partial_ratio + token_set, no damping)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.9384 | 0.9362 | 0.9373 | 0.9379 | 0.9369 |
| Award ID | 0.9069 | 0.8909 | 0.8988 | 0.9037 | 0.8957 |
| Scheme | 0.7407 | 0.8264 | 0.7812 | 0.7564 | 0.7980 |
| Title | 0.9048 | 0.3958 | 0.5507 | 0.7197 | 0.4787 |

#### Balanced (length-damped + acronym detection)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.8882 | 0.8960 | 0.8921 | 0.8897 | 0.8936 |
| Award ID | 0.8889 | 0.8732 | 0.8810 | 0.8857 | 0.8779 |
| Scheme | 0.6889 | 0.7686 | 0.7266 | 0.7035 | 0.7422 |
| Title | 0.9048 | 0.3958 | 0.5507 | 0.7197 | 0.4787 |

#### Strict (token_sort_ratio only)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.8796 | 0.8874 | 0.8835 | 0.8812 | 0.8850 |
| Award ID | 0.8859 | 0.8702 | 0.8780 | 0.8827 | 0.8750 |
| Scheme | 0.6667 | 0.7438 | 0.7031 | 0.6808 | 0.7182 |
| Title | 0.8095 | 0.3542 | 0.4928 | 0.6439 | 0.4283 |

All 300 outputs were valid JSON.

### funding-entity-extraction-dataset-mix test sets

Evaluated on the held-out test sets from [`cometadata/funding-entity-extraction-dataset-mix`](https://huggingface.co/datasets/cometadata/funding-entity-extraction-dataset-mix) with the same evaluation harness. `test_with_context` uses the `full_text` field (the funding statement with its surrounding document text) as the model input.

#### `test.jsonl` (347 examples)

Permissive (partial_ratio + token_set, no damping)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.9376 | 0.8923 | 0.9144 | 0.9282 | 0.9058 |
| Award ID | 0.8407 | 0.8339 | 0.8373 | 0.8394 | 0.8360 |
| Scheme | 0.4118 | 0.5927 | 0.4860 | 0.4385 | 0.5221 |
| Title | 0.1034 | 0.0170 | 0.0293 | 0.0514 | 0.0229 |

Balanced (length-damped + acronym detection)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.9008 | 0.8555 | 0.8776 | 0.8913 | 0.8689 |
| Award ID | 0.8138 | 0.8072 | 0.8105 | 0.8125 | 0.8092 |
| Scheme | 0.3725 | 0.5363 | 0.4397 | 0.3968 | 0.4724 |
| Title | 0.0690 | 0.0114 | 0.0195 | 0.0342 | 0.0153 |

Strict (token_sort_ratio only)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.8722 | 0.8276 | 0.8493 | 0.8629 | 0.8408 |
| Award ID | 0.7963 | 0.7898 | 0.7930 | 0.7949 | 0.7918 |
| Scheme | 0.3333 | 0.4798 | 0.3934 | 0.3550 | 0.4227 |
| Title | 0.0690 | 0.0114 | 0.0195 | 0.0342 | 0.0153 |

#### `test_degraded.jsonl` (1,288 examples)

Permissive (partial_ratio + token_set, no damping)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.9285 | 0.9216 | 0.9250 | 0.9271 | 0.9237 |
| Award ID | 0.8586 | 0.8560 | 0.8573 | 0.8581 | 0.8568 |
| Scheme | 0.7413 | 0.6704 | 0.7041 | 0.7260 | 0.6907 |
| Title | 0.7723 | 0.2267 | 0.3506 | 0.5214 | 0.2897 |

Balanced (length-damped + acronym detection)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.9001 | 0.8906 | 0.8953 | 0.8981 | 0.8935 |
| Award ID | 0.8416 | 0.8390 | 0.8403 | 0.8411 | 0.8398 |
| Scheme | 0.6757 | 0.6110 | 0.6417 | 0.6617 | 0.6296 |
| Title | 0.6634 | 0.1948 | 0.3011 | 0.4479 | 0.2489 |

Strict (token_sort_ratio only)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.8801 | 0.8690 | 0.8745 | 0.8778 | 0.8724 |
| Award ID | 0.8317 | 0.8291 | 0.8304 | 0.8312 | 0.8299 |
| Scheme | 0.6039 | 0.5461 | 0.5735 | 0.5913 | 0.5627 |
| Title | 0.6139 | 0.1802 | 0.2787 | 0.4144 | 0.2303 |

#### `test_with_context.jsonl` (322 examples)

Permissive (partial_ratio + token_set, no damping)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.9348 | 0.9383 | 0.9365 | 0.9355 | 0.9372 |
| Award ID | 0.8711 | 0.8690 | 0.8700 | 0.8707 | 0.8696 |
| Scheme | 0.7515 | 0.6844 | 0.7164 | 0.7371 | 0.7037 |
| Title | 0.8750 | 0.2442 | 0.3818 | 0.5769 | 0.3138 |

Balanced (length-damped + acronym detection)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.9072 | 0.9061 | 0.9066 | 0.9070 | 0.9064 |
| Award ID | 0.8538 | 0.8517 | 0.8527 | 0.8534 | 0.8523 |
| Scheme | 0.6871 | 0.6257 | 0.6550 | 0.6739 | 0.6434 |
| Title | 0.7500 | 0.2093 | 0.3273 | 0.4945 | 0.2690 |

Strict (token_sort_ratio only)

| Field | P | R | F1 | F0.5 | F1.5 |
|-------|---|---|----|----|------|
| Funder | 0.8863 | 0.8842 | 0.8852 | 0.8859 | 0.8848 |
| Award ID | 0.8439 | 0.8418 | 0.8428 | 0.8434 | 0.8424 |
| Scheme | 0.6074 | 0.5531 | 0.5789 | 0.5957 | 0.5687 |
| Title | 0.7083 | 0.1977 | 0.3091 | 0.4670 | 0.2540 |

### Comparison to the Llama 3.1 8B baseline

Balanced-mode F1 on the two test sets reported by the Llama baseline card:

**arxiv_test (300 examples)**

| Field | Llama 3.1 8B | Qwen3.5-9B | Δ |
|-------|:---:|:---:|:---:|
| Funder | 0.9001 | 0.8921 | −0.008 |
| Award ID | 0.8780 | 0.8810 | +0.003 |
| Scheme | 0.6466 | 0.7266 | +0.080 |
| Title | 0.5316 | 0.5507 | +0.019 |

**`test_degraded` (1,288 examples)**

| Field | Llama 3.1 8B | Qwen3.5-9B | Δ |
|-------|:---:|:---:|:---:|
| Funder | 0.8999 | 0.8953 | −0.005 |
| Award ID | 0.8477 | 0.8403 | −0.007 |
| Scheme | 0.6370 | 0.6417 | +0.005 |
| Title | 0.4110 | 0.3011 | −0.110 |

Funder and award ID (the reward-weighted fields) are within 0.008 F1 of the Llama baseline on both sets. Scheme and title carry zero reward weight.

## Usage

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B")
model = PeftModel.from_pretrained(base_model, "cometadata/funding-extraction-qwen3.5-9B-artifact-data-mix-grpo-mixed-reward")
tokenizer = AutoTokenizer.from_pretrained("cometadata/funding-extraction-qwen3.5-9B-artifact-data-mix-grpo-mixed-reward")

prompt = """Extract funding information from the following statement:

This work was supported by the National Science Foundation under grant DMS-1613002 and by the NIH (R01-AI123456)."""

messages = [
    {"role": "system", "content": "You are an expert at extracting structured funding metadata from academic papers. Given a funding statement, extract all funders and their associated awards. Return a JSON array of funder objects. Each funder has:\n- \"funder_name\": string or null\n- \"awards\": array of objects with \"award_ids\" (array of strings), \"funding_scheme\" (array of strings), and \"award_title\" (array of strings)\nReturn ONLY the JSON array, no other text."},
    {"role": "user", "content": prompt},
]

# Model trained with thinking disabled; keep enable_thinking=False.
inputs = tokenizer.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True, enable_thinking=False
)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```

## Output Format

```json
[
  {
    "funder_name": "National Science Foundation",
    "awards": [
      {
        "award_ids": ["DMS-1613002"],
        "funding_scheme": [],
        "award_title": []
      }
    ]
  },
  {
    "funder_name": "NIH",
    "awards": [
      {
        "award_ids": ["R01-AI123456"],
        "funding_scheme": [],
        "award_title": []
      }
    ]
  }
]
```