--- library_name: peft base_model: meta-llama/Llama-3.1-8B-Instruct datasets: - cometadata/funding-extraction-sft-data tags: - lora - funding-extraction - grpo - rl license: llama3.1 --- # Funding Parsing LoRA — Llama 3.1 8B Instruct + GRPO A LoRA adapter for extracting structured funding information from funding statements in scholarly works. ## Model Details - **Base model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) - **Method**: Supervised fine-tuning (SFT) followed by reinforcement learning (GRPO) - **Task**: Given a funding statement, extract a structured JSON array of funders with their award IDs - **Training data**: [cometadata/funding-extraction-sft-data](https://huggingface.co/datasets/cometadata/funding-extraction-sft-data) ## Training Pipeline ### Stage 1: Supervised Fine-Tuning - **Data**: [cometadata/funding-extraction-sft-data](https://huggingface.co/datasets/cometadata/funding-extraction-sft-data) — 1,316 real examples (`train.jsonl`) + 2,531 synthetic examples (`synthetic.jsonl`) upsampled 2x = 6,378 total - **Epochs**: 2 - **LoRA rank**: 64 - **LoRA alpha**: 32 - **Learning rate**: ~2.86e-4 - **Batch size**: 128 - **Max sequence length**: 4,096 tokens - **LR schedule**: Linear decay - **Renderer**: llama3 (Llama 3.1 Instruct chat template) - **Train on**: Last assistant message only ### Stage 2: Reinforcement Learning (GRPO) - **Algorithm**: Group Relative Policy Optimization (GRPO) - **Starting checkpoint**: SFT final weights - **Data**: 3,462 train / 385 eval examples - **Learning rate**: 3e-5 - **Temperature**: 0.8 - **Batch size**: 16, Group size: 8 - **KL penalty**: 0.03 (against SFT reference policy) - **Best checkpoint**: Step 130 / 217 (selected by eval reward) - **Eval reward at best step**: 0.961 ### Reward Function See [https://github.com/cometadata/funding-metadata-enrichment/tree/main/train](https://github.com/cometadata/funding-metadata-enrichment/tree/main/train) for the full training code Gated, hierarchical matching on funder using the Hungarian algorithm for limiting subordinate fields 1:1 funder pairing: - Funder name - Fuzzy matching using a token-sort with acronym and containment boosts, F0.5 score, weight 0.50 - Award IDs - Normalized exact matching, F0.5 score, weight 0.50 - Funding scheme - Not weighted - Award title - Not weighted ## Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") model = PeftModel.from_pretrained(base_model, "cometadata/funding-parsing-lora-Llama_3.1_8B-instruct-ep2-r64-a32-grpo") tokenizer = AutoTokenizer.from_pretrained("cometadata/funding-parsing-lora-Llama_3.1_8B-instruct-ep2-r64-a32-grpo") messages = [ {"role": "system", "content": "Extract funding information from the text. Return a JSON array of funders."}, {"role": "user", "content": "Extract funding information from the following statement:\n\nThis work was supported by the National Science Foundation (Grant No. 2045678) and the European Research Council (ERC-2021-StG-101039567)."} ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True) outputs = model.generate(inputs, max_new_tokens=512, temperature=0.1) print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)) ``` Expected output: ```json [ { "funder_name": "National Science Foundation", "awards": [ { "award_ids": ["2045678"], "funding_scheme": [], "award_title": [] } ] }, { "funder_name": "European Research Council", "awards": [ { "award_ids": ["ERC-2021-StG-101039567"], "funding_scheme": [], "award_title": [] } ] } ] ``` ## Training Infrastructure Trained on [Tinker](https://thinkingmachines.ai/tinker/) by [Thinking Machines Lab](https://thinkingmachines.ai) ## Eval Results | Step | Eval Reward | Funder F0.5 | Award F0.5 | Format Valid | KL | |------|-------------|-------------|------------|--------------|-----| | 0 | 0.944 | 0.961 | 0.926 | 99.7% | 0.0005 | | 10 | 0.937 | 0.954 | 0.920 | 99.7% | 0.0015 | | 20 | 0.950 | 0.969 | 0.932 | 100% | 0.0020 | | 30 | 0.954 | 0.966 | 0.942 | 100% | 0.0025 | | 40 | 0.951 | 0.971 | 0.931 | 100% | 0.0013 | | 50 | 0.938 | 0.956 | 0.919 | 100% | 0.0051 | | 60 | 0.949 | 0.967 | 0.931 | 100% | 0.0047 | | 70 | 0.954 | 0.968 | 0.939 | 100% | 0.0025 | | 80 | 0.951 | 0.962 | 0.940 | 100% | 0.0021 | | 90 | 0.945 | 0.959 | 0.931 | 100% | 0.0026 | | 100 | 0.943 | 0.963 | 0.923 | 99.7% | 0.0016 | | 110 | 0.945 | 0.961 | 0.929 | 99.5% | 0.0036 | | 120 | 0.950 | 0.964 | 0.936 | 99.5% | 0.0028 | | **130** | **0.961** | **0.974** | **0.948** | **100%** | **0.0026** | | 140 | 0.955 | 0.973 | 0.938 | 100% | 0.0020 | | 150 | 0.957 | 0.972 | 0.942 | 100% | 0.0012 | | 160 | 0.947 | 0.963 | 0.931 | 99.7% | 0.0034 | | 170 | 0.951 | 0.957 | 0.944 | 100% | 0.0023 | | 180 | 0.944 | 0.960 | 0.928 | 100% | 0.0013 | | 190 | 0.933 | 0.956 | 0.910 | 99.5% | 0.0004 | | 200 | 0.942 | 0.961 | 0.922 | 99.7% | 0.0017 | | 210 | 0.957 | 0.967 | 0.947 | 100% | 0.0014 |