Text Generation
PEFT
Safetensors
English
funding-extraction
lora
grpo
rl
scholarly-metadata
conversational
Instructions to use cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B") model = PeftModel.from_pretrained(base_model, "cometadata/funding-extraction-qwen3.5-9B-non-thinking-artifact-data-mix-grpo-mixed-reward") - Notebooks
- Google Colab
- Kaggle
Trim editorial prose from model card
Browse files
README.md
CHANGED
|
@@ -29,8 +29,8 @@ Trained on the [`cometadata/funding-extraction-artifact-data-mix-grpo-mixed-rewa
|
|
| 29 |
|
| 30 |
- **Base model:** `Qwen/Qwen3.5-9B`
|
| 31 |
- **Data (`data/sft/`):** 3,528 real + 7,240 synthetic funding statements with gold-standard funder/award labels (synthetic upsampled 2×)
|
| 32 |
-
- **Data augmentation:** 50% of training examples augmented with synthetic noise (OCR-like case errors, digit/letter swaps, Unicode artifacts, XML/HTML tags, LaTeX markup)
|
| 33 |
-
- **Renderer:** `qwen3_5_disable_thinking`
|
| 34 |
- **LoRA rank:** 128
|
| 35 |
- **Epochs:** 2
|
| 36 |
- **Result:** eval NLL 0.116 → 0.0035 over 252 steps
|
|
@@ -82,11 +82,11 @@ Trained on the [`cometadata/funding-extraction-artifact-data-mix-grpo-mixed-rewa
|
|
| 82 |
| Scheme | 0.6667 | 0.7438 | 0.7031 | 0.6808 | 0.7182 |
|
| 83 |
| Title | 0.8095 | 0.3542 | 0.4928 | 0.6439 | 0.4283 |
|
| 84 |
|
| 85 |
-
|
| 86 |
|
| 87 |
### funding-entity-extraction-dataset-mix test sets
|
| 88 |
|
| 89 |
-
Evaluated on the held-out test sets from [`cometadata/funding-entity-extraction-dataset-mix`](https://huggingface.co/datasets/cometadata/funding-entity-extraction-dataset-mix)
|
| 90 |
|
| 91 |
#### `test.jsonl` (347 examples)
|
| 92 |
|
|
@@ -117,7 +117,7 @@ Strict (token_sort_ratio only)
|
|
| 117 |
| Scheme | 0.3333 | 0.4798 | 0.3934 | 0.3550 | 0.4227 |
|
| 118 |
| Title | 0.0690 | 0.0114 | 0.0195 | 0.0342 | 0.0153 |
|
| 119 |
|
| 120 |
-
#### `test_degraded.jsonl` (1,288 examples
|
| 121 |
|
| 122 |
Permissive (partial_ratio + token_set, no damping)
|
| 123 |
|
|
@@ -146,7 +146,7 @@ Strict (token_sort_ratio only)
|
|
| 146 |
| Scheme | 0.6039 | 0.5461 | 0.5735 | 0.5913 | 0.5627 |
|
| 147 |
| Title | 0.6139 | 0.1802 | 0.2787 | 0.4144 | 0.2303 |
|
| 148 |
|
| 149 |
-
#### `test_with_context.jsonl` (322 examples
|
| 150 |
|
| 151 |
Permissive (partial_ratio + token_set, no damping)
|
| 152 |
|
|
@@ -177,7 +177,7 @@ Strict (token_sort_ratio only)
|
|
| 177 |
|
| 178 |
### Comparison to the Llama 3.1 8B baseline
|
| 179 |
|
| 180 |
-
|
| 181 |
|
| 182 |
**arxiv_test (300 examples)**
|
| 183 |
|
|
@@ -188,7 +188,7 @@ Both test sets the Llama baseline card reports, scored with the same harness and
|
|
| 188 |
| Scheme | 0.6466 | 0.7266 | +0.080 |
|
| 189 |
| Title | 0.5316 | 0.5507 | +0.019 |
|
| 190 |
|
| 191 |
-
**
|
| 192 |
|
| 193 |
| Field | Llama 3.1 8B | Qwen3.5-9B | Δ |
|
| 194 |
|-------|:---:|:---:|:---:|
|
|
@@ -197,7 +197,7 @@ Both test sets the Llama baseline card reports, scored with the same harness and
|
|
| 197 |
| Scheme | 0.6370 | 0.6417 | +0.005 |
|
| 198 |
| Title | 0.4110 | 0.3011 | −0.110 |
|
| 199 |
|
| 200 |
-
|
| 201 |
|
| 202 |
## Usage
|
| 203 |
|
|
@@ -218,7 +218,7 @@ messages = [
|
|
| 218 |
{"role": "user", "content": prompt},
|
| 219 |
]
|
| 220 |
|
| 221 |
-
#
|
| 222 |
inputs = tokenizer.apply_chat_template(
|
| 223 |
messages, return_tensors="pt", add_generation_prompt=True, enable_thinking=False
|
| 224 |
)
|
|
|
|
| 29 |
|
| 30 |
- **Base model:** `Qwen/Qwen3.5-9B`
|
| 31 |
- **Data (`data/sft/`):** 3,528 real + 7,240 synthetic funding statements with gold-standard funder/award labels (synthetic upsampled 2×)
|
| 32 |
+
- **Data augmentation:** 50% of training examples augmented with synthetic noise (OCR-like case errors, digit/letter swaps, Unicode artifacts, XML/HTML tags, LaTeX markup)
|
| 33 |
+
- **Renderer:** `qwen3_5_disable_thinking` (no chain-of-thought; keep thinking disabled at inference, see [Usage](#usage))
|
| 34 |
- **LoRA rank:** 128
|
| 35 |
- **Epochs:** 2
|
| 36 |
- **Result:** eval NLL 0.116 → 0.0035 over 252 steps
|
|
|
|
| 82 |
| Scheme | 0.6667 | 0.7438 | 0.7031 | 0.6808 | 0.7182 |
|
| 83 |
| Title | 0.8095 | 0.3542 | 0.4928 | 0.6439 | 0.4283 |
|
| 84 |
|
| 85 |
+
All 300 outputs were valid JSON.
|
| 86 |
|
| 87 |
### funding-entity-extraction-dataset-mix test sets
|
| 88 |
|
| 89 |
+
Evaluated on the held-out test sets from [`cometadata/funding-entity-extraction-dataset-mix`](https://huggingface.co/datasets/cometadata/funding-entity-extraction-dataset-mix) with the same evaluation harness. `test_with_context` uses the `full_text` field (the funding statement with its surrounding document text) as the model input.
|
| 90 |
|
| 91 |
#### `test.jsonl` (347 examples)
|
| 92 |
|
|
|
|
| 117 |
| Scheme | 0.3333 | 0.4798 | 0.3934 | 0.3550 | 0.4227 |
|
| 118 |
| Title | 0.0690 | 0.0114 | 0.0195 | 0.0342 | 0.0153 |
|
| 119 |
|
| 120 |
+
#### `test_degraded.jsonl` (1,288 examples)
|
| 121 |
|
| 122 |
Permissive (partial_ratio + token_set, no damping)
|
| 123 |
|
|
|
|
| 146 |
| Scheme | 0.6039 | 0.5461 | 0.5735 | 0.5913 | 0.5627 |
|
| 147 |
| Title | 0.6139 | 0.1802 | 0.2787 | 0.4144 | 0.2303 |
|
| 148 |
|
| 149 |
+
#### `test_with_context.jsonl` (322 examples)
|
| 150 |
|
| 151 |
Permissive (partial_ratio + token_set, no damping)
|
| 152 |
|
|
|
|
| 177 |
|
| 178 |
### Comparison to the Llama 3.1 8B baseline
|
| 179 |
|
| 180 |
+
Balanced-mode F1 on the two test sets reported by the Llama baseline card:
|
| 181 |
|
| 182 |
**arxiv_test (300 examples)**
|
| 183 |
|
|
|
|
| 188 |
| Scheme | 0.6466 | 0.7266 | +0.080 |
|
| 189 |
| Title | 0.5316 | 0.5507 | +0.019 |
|
| 190 |
|
| 191 |
+
**`test_degraded` (1,288 examples)**
|
| 192 |
|
| 193 |
| Field | Llama 3.1 8B | Qwen3.5-9B | Δ |
|
| 194 |
|-------|:---:|:---:|:---:|
|
|
|
|
| 197 |
| Scheme | 0.6370 | 0.6417 | +0.005 |
|
| 198 |
| Title | 0.4110 | 0.3011 | −0.110 |
|
| 199 |
|
| 200 |
+
Funder and award ID (the reward-weighted fields) are within 0.008 F1 of the Llama baseline on both sets. Scheme and title carry zero reward weight.
|
| 201 |
|
| 202 |
## Usage
|
| 203 |
|
|
|
|
| 218 |
{"role": "user", "content": prompt},
|
| 219 |
]
|
| 220 |
|
| 221 |
+
# Model trained with thinking disabled; keep enable_thinking=False.
|
| 222 |
inputs = tokenizer.apply_chat_template(
|
| 223 |
messages, return_tensors="pt", add_generation_prompt=True, enable_thinking=False
|
| 224 |
)
|