---
library_name: peft
base_model: meta-llama/Llama-3.1-8B-Instruct
datasets:
- cometadata/funding-extraction-sft-data
tags:
- lora
- funding-extraction
- grpo
- rl
license: llama3.1
---

# Funding Parsing LoRA — Llama 3.1 8B Instruct + GRPO

A LoRA adapter for extracting structured funding information from funding statements in scholarly works.

## Model Details

- **Base model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- **Method**: Supervised fine-tuning (SFT) followed by reinforcement learning (GRPO)
- **Task**: Given a funding statement, extract a structured JSON array of funders with their award IDs
- **Training data**: [cometadata/funding-extraction-sft-data](https://huggingface.co/datasets/cometadata/funding-extraction-sft-data)

## Training Pipeline

### Stage 1: Supervised Fine-Tuning

- **Data**: [cometadata/funding-extraction-sft-data](https://huggingface.co/datasets/cometadata/funding-extraction-sft-data) — 1,316 real examples (`train.jsonl`) + 2,531 synthetic examples (`synthetic.jsonl`) upsampled 2x = 6,378 total
- **Epochs**: 2
- **LoRA rank**: 64
- **LoRA alpha**: 32
- **Learning rate**: ~2.86e-4
- **Batch size**: 128
- **Max sequence length**: 4,096 tokens
- **LR schedule**: Linear decay
- **Renderer**: llama3 (Llama 3.1 Instruct chat template)
- **Train on**: Last assistant message only

### Stage 2: Reinforcement Learning (GRPO)

- **Algorithm**: Group Relative Policy Optimization (GRPO)
- **Starting checkpoint**: SFT final weights
- **Data**: 3,462 train / 385 eval examples
- **Learning rate**: 3e-5
- **Temperature**: 0.8
- **Batch size**: 16, Group size: 8
- **KL penalty**: 0.03 (against SFT reference policy)
- **Best checkpoint**: Step 130 / 217 (selected by eval reward)
- **Eval reward at best step**: 0.961

### Reward Function

See [https://github.com/cometadata/funding-metadata-enrichment/tree/main/train](https://github.com/cometadata/funding-metadata-enrichment/tree/main/train) for the full training code

Gated, hierarchical matching on funder using the Hungarian algorithm for limiting subordinate fields 1:1 funder pairing:
- Funder name - Fuzzy matching using a token-sort with acronym and containment boosts, F0.5 score, weight 0.50
- Award IDs -  Normalized exact matching, F0.5 score, weight 0.50
- Funding scheme - Not weighted
- Award title - Not weighted

## Usage

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = PeftModel.from_pretrained(base_model, "cometadata/funding-parsing-lora-Llama_3.1_8B-instruct-ep2-r64-a32-grpo")
tokenizer = AutoTokenizer.from_pretrained("cometadata/funding-parsing-lora-Llama_3.1_8B-instruct-ep2-r64-a32-grpo")

messages = [
    {"role": "system", "content": "Extract funding information from the text. Return a JSON array of funders."},
    {"role": "user", "content": "Extract funding information from the following statement:\n\nThis work was supported by the National Science Foundation (Grant No. 2045678) and the European Research Council (ERC-2021-StG-101039567)."}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.1)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
```

Expected output:
```json
[
  {
    "funder_name": "National Science Foundation",
    "awards": [
      {
        "award_ids": ["2045678"],
        "funding_scheme": [],
        "award_title": []
      }
    ]
  },
  {
    "funder_name": "European Research Council",
    "awards": [
      {
        "award_ids": ["ERC-2021-StG-101039567"],
        "funding_scheme": [],
        "award_title": []
      }
    ]
  }
]
```

## Training Infrastructure

Trained on [Tinker](https://thinkingmachines.ai/tinker/) by [Thinking Machines Lab](https://thinkingmachines.ai)

## Eval Results

| Step | Eval Reward | Funder F0.5 | Award F0.5 | Format Valid | KL |
|------|-------------|-------------|------------|--------------|-----|
| 0 | 0.944 | 0.961 | 0.926 | 99.7% | 0.0005 |
| 10 | 0.937 | 0.954 | 0.920 | 99.7% | 0.0015 |
| 20 | 0.950 | 0.969 | 0.932 | 100% | 0.0020 |
| 30 | 0.954 | 0.966 | 0.942 | 100% | 0.0025 |
| 40 | 0.951 | 0.971 | 0.931 | 100% | 0.0013 |
| 50 | 0.938 | 0.956 | 0.919 | 100% | 0.0051 |
| 60 | 0.949 | 0.967 | 0.931 | 100% | 0.0047 |
| 70 | 0.954 | 0.968 | 0.939 | 100% | 0.0025 |
| 80 | 0.951 | 0.962 | 0.940 | 100% | 0.0021 |
| 90 | 0.945 | 0.959 | 0.931 | 100% | 0.0026 |
| 100 | 0.943 | 0.963 | 0.923 | 99.7% | 0.0016 |
| 110 | 0.945 | 0.961 | 0.929 | 99.5% | 0.0036 |
| 120 | 0.950 | 0.964 | 0.936 | 99.5% | 0.0028 |
| **130** | **0.961** | **0.974** | **0.948** | **100%** | **0.0026** |
| 140 | 0.955 | 0.973 | 0.938 | 100% | 0.0020 |
| 150 | 0.957 | 0.972 | 0.942 | 100% | 0.0012 |
| 160 | 0.947 | 0.963 | 0.931 | 99.7% | 0.0034 |
| 170 | 0.951 | 0.957 | 0.944 | 100% | 0.0023 |
| 180 | 0.944 | 0.960 | 0.928 | 100% | 0.0013 |
| 190 | 0.933 | 0.956 | 0.910 | 99.5% | 0.0004 |
| 200 | 0.942 | 0.961 | 0.922 | 99.7% | 0.0017 |
| 210 | 0.957 | 0.967 | 0.947 | 100% | 0.0014 |