funding-parsing-and-extraction-Llama-3.1-8B-Instruct-lora
Two-stage LoRA for joint funding-statement extraction (stage 1) and
funder parsing (stage 2), trained on top of
meta-llama/Llama-3.1-8B-Instruct.
Repository contents
sft/ # Stage A: supervised fine-tuning LoRA adapter
dapo/ # Stage B: DAPO (RL) LoRA adapter β applied after SFT is merged
Each folder is a standard PEFT LoRA: adapter_config.json +
adapter_model.safetensors.
What the model does
Given a research article (or a chunk of one), the adapter produces funding metadata in two stages:
- Extract β copy any funding-acknowledgment sentences verbatim.
- Parse β take those sentences and emit structured funder / award records.
Both stages share the same weights; only the system prompt changes.
System prompts
Both stages must be called with the exact prompts below.
Stage 1 β extract
You are a funding statement extractor. Given an article or text chunk, identify all funding acknowledgment statements. Return ONLY valid JSON in this exact format: {"statements": ["statement1", "statement2", ...]}. Each statement must be copied verbatim from the source text. If no funding statements exist, return {"statements": []}. Do not include any text outside the JSON object.
User message: the article text (or chunk) as-is.
Stage 2 β parse
You are an expert at extracting structured funding metadata from academic papers. Given a funding statement, extract all funders and their associated awards. Return a JSON array of funder objects. Each funder has:
- "funder_name": string or null
- "awards": array of objects with "award_ids" (array of strings), "funding_scheme" (array of strings), and "award_title" (array of strings)
Return ONLY the JSON array, no other text.
User message:
Extract funding information from the following statement:
<funding statement here>
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "meta-llama/Llama-3.1-8B-Instruct"
REPO = "cometadata/funding-parsing-and-extraction-Llama-3.1-8B-Instruct-lora"
tokenizer = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto")
# Stage A: apply and merge the SFT adapter
model = PeftModel.from_pretrained(model, REPO, subfolder="sft")
model = model.merge_and_unload()
# Stage B: apply and merge the DAPO adapter on top of the SFT-merged model
model = PeftModel.from_pretrained(model, REPO, subfolder="dapo")
model = model.merge_and_unload()
model.eval()
The DAPO adapter's deltas are computed relative to the SFT-merged base, so the adapters must be applied in order: SFT first, then DAPO.
Running stage 1 (extract)
SYSTEM_EXTRACT = (
"You are a funding statement extractor. Given an article or text chunk, "
"identify all funding acknowledgment statements. Return ONLY valid JSON "
'in this exact format: {"statements": ["statement1", "statement2", ...]}. '
"Each statement must be copied verbatim from the source text. "
'If no funding statements exist, return {"statements": []}. '
"Do not include any text outside the JSON object."
)
def extract(article_text: str) -> str:
messages = [
{"role": "system", "content": SYSTEM_EXTRACT},
{"role": "user", "content": article_text},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
return tokenizer.decode(out[0, inputs.shape[1]:], skip_special_tokens=True)
Running stage 2 (parse)
SYSTEM_PARSE = (
"You are an expert at extracting structured funding metadata from academic papers. "
"Given a funding statement, extract all funders and their associated awards. "
"Return a JSON array of funder objects. Each funder has:\n"
'- "funder_name": string or null\n'
'- "awards": array of objects with "award_ids" (array of strings), '
'"funding_scheme" (array of strings), and "award_title" (array of strings)\n'
"Return ONLY the JSON array, no other text."
)
def parse(funding_statement: str) -> str:
user = f"Extract funding information from the following statement:\n\n{funding_statement}"
messages = [
{"role": "system", "content": SYSTEM_PARSE},
{"role": "user", "content": user},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=1024, do_sample=False)
return tokenizer.decode(out[0, inputs.shape[1]:], skip_special_tokens=True)
Chunked long-document pipeline
Articles longer than ~7500 tokens are split into overlapping 7500-token chunks with 1024 overlap at training time. For inference on long documents, apply stage 1 to each chunk, dedupe/concat the extracted statements, then pass the joined text to stage 2 once per article.
Training recipe
- Base:
meta-llama/Llama-3.1-8B-Instruct - LoRA: r=64, Ξ±=128, dropout=0.05,
target_modules=all-linear(q,k,v,o,gate,up,downproj) - Data: Adam's
cometadata/funding-extraction-artifact-data-mix-grpo-mixed-rewarddataset (articles + gold funding statements + structured funder metadata)
Stage A β SFT
- 2 epochs, batch 2, grad-accum 16, bf16, max_length 8192
- LR 1e-4, 2ΓH100
- Mixture: extract prompts + parse prompts on rows with markdown and structured funder labels
- Trains the model to format JSON correctly for both stages
Stage B β DAPO (RL)
- Algorithm: GRPO variant with DAPO-style clipping
(
eps_clip_low=0.2,eps_clip_high=0.28, token-mean loss reduction) - Regularization:
use_kl_loss=true(coef=0.001),entropy_loss_coef=0.01β essential to prevent the all-empty collapse we observed without regularization - Sampling:
temperature=0.9,top_p=0.9, n_samples=16 per prompt, step-wise trajectories (10 extract rollouts + 1 parse rollout per article) - Optimizer: LR 2e-5 (constant with 20-step warmup), max_grad_norm=1.0
- Reward: pure parse reward (funder F0.5 + award-id F0.5, soft ID
matching; see below). An earlier variant added a per-chunk extract reward
which induced reward hacking (always emitting
{"statements": []}), so the chunk component was dropped in favor of KL / entropy regularization. - OCR pre-processing: zero-width / BOM characters are stripped from input markdown before chunking.
- Compute: 8ΓH100, ~14h wall, 36 DAPO steps
Reward formula (stage B)
r = 0.50 Β· funder_F0.5 + 0.40 Β· award_id_F0.5 + 0.10 Β· flat_award_id_F0.5
with soft ID matching (edit-distance-1 partial credit on award_id).
Empty gold β empty prediction gets r=1.0; mismatched empty prediction with
non-empty gold gets r=0.0.
Final training-time metrics (rollout average over last 5 steps)
| metric | value |
|---|---|
| reward/avg_raw_reward | ~0.90 |
| reward/avg_pass_at_16 | ~0.99 |
| avg response length | ~30 tokens |
Best single step: reward 0.941, pass@16 1.0 at step 33/36.
Citation / provenance
Training code: [github TODO β the rl/ directory of the
funding-statement-identification repo].
Trained by the comet-data / funding extraction effort, 2026-04.
Model tree for cometadata/funding-parsing-and-extraction-Llama-3.1-8B-Instruct-lora
Base model
meta-llama/Llama-3.1-8B