--- base_model: allenai/PRIMERA library_name: peft license: apache-2.0 language: - en tags: - base_model:adapter:allenai/PRIMERA - lora - transformers - summarization - primera - chain-finetuning datasets: - billsum - ccdv/arxiv-summarization metrics: - rouge pipeline_tag: summarization --- # PRIMERA-BillSum-arXiv (2-Stage Chain LoRA, bf16) A LoRA adapter for [allenai/PRIMERA](https://huggingface.co/allenai/PRIMERA) trained via **2-stage sequential chain fine-tuning**: BillSum → arXiv. Starts from a BillSum-adapted PRIMERA and continues training on arXiv scientific papers. ## Model Details - **Base model:** [allenai/PRIMERA](https://huggingface.co/allenai/PRIMERA) - **Method:** LoRA (Low-Rank Adaptation), bf16 precision (no quantization) - **Chain order:** BillSum → arXiv - **Language:** English > **Note:** Earlier docs called this "QLoRA". 4-bit quantization caused NaN gradients with LED/Longformer in-place attention ops, so quantization was disabled. Training is standard LoRA in bf16. ### Training Stages | Stage | Dataset | |-------|---------| | 1 | BillSum | | 2 | arXiv | ### Hyperparameters - **LoRA rank (r):** 16 - **LoRA alpha:** 32 - **LoRA dropout:** 0.05 - **Precision:** bf16 (no quantization) - **Target modules (LED/Longformer):** - Encoder: query, key, value, query_global, key_global, value_global, output - Decoder: q_proj, k_proj, v_proj, out_proj - Feed-forward: fc1, fc2 ## Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM from peft import PeftModel import torch tokenizer = AutoTokenizer.from_pretrained("xNoper/primera-billsum-arxiv") base = AutoModelForSeq2SeqLM.from_pretrained( "allenai/PRIMERA", torch_dtype=torch.bfloat16 ) model = PeftModel.from_pretrained(base, "xNoper/primera-billsum-arxiv") ``` ## Citation If you use this model, please also cite the underlying base model: ```bibtex @inproceedings{xiao-etal-2022-primera, title = "{PRIMERA}: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization", author = "Xiao, Wen and Beltagy, Iz and Carenini, Giuseppe and Cohan, Arman", booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics", year = "2022", } ```