--- library_name: peft tags: - Prediction - Summarisation - Legal NLP - LoRA - Flan-T5 datasets: - L-NLProc/PredEx_Instruction-Tuning_Pred-Exp language: - en metrics: - bertscore - rouge - accuracy base_model: - google/flan-t5-large pipeline_tag: summarization --- # Legal Text Summarisation & Outcome Prediction Engine ## Model Details ### Model Description The Legal Text Summarisation model is an advanced NLP pipeline tailored specifically for dense Indian legal documents. It leverages a parameter-efficient fine-tuned (PEFT) architecture using LoRA to accomplish two primary tasks simultaneously: 1. **Binary Classification:** Accurately predicting the final verdict of a legal case (`1 = Accepted`, `0 = Rejected`). 2. **Abstractive Summarisation:** Generating a concise, legally accurate explanation detailing the reasoning behind the verdict. Unlike standard summarisation models that struggle with lengthy legalese and hallucinate facts, this model is designed to process dense legal context and output precise legal reasoning validated by semantic and lexical metrics. - **Developed by:** Swastik Sharma - **Model type:** Text-to-Text Generation (T5 Architecture with LoRA adapter) - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from base model:** `google/flan-t5-large` ## Uses ### Direct Use This model is intended to be used by legal tech developers, researchers, and law students to quickly parse Indian legal judgments. By providing the raw text of a court case, the model outputs a predicted label and a generated summary of the legal rationale. ### Out-of-Scope Use This model **does not provide legal advice**. It is an AI research tool intended for summarisation and prediction based on historical precedent. It should not be used as a substitute for professional legal counsel or human judicial oversight. ## Bias, Risks, and Limitations Legal documents often contain sensitive histories and societal biases. As this model was fine-tuned on historical Indian court cases (`PredEx`), it may reflect historical biases present in the judicial system. Furthermore, generative language models can occasionally hallucinate; therefore, the generated explanations should always be verified against the original court documents before being used in any official legal capacity. ## How to Get Started with the Model Since this is a LoRA adapter, you need to load the base `flan-t5-large` model and attach these weights using the `peft` library. ```python from transformers import T5ForConditionalGeneration, T5Tokenizer from peft import PeftModel import torch base_model_id = "google/flan-t5-large" peft_model_id = "Beelzi/legal-flan-t5-large-lora" # Load Tokenizer tokenizer = T5Tokenizer.from_pretrained(peft_model_id) # Load Base Model in FP16 to save VRAM base_model = T5ForConditionalGeneration.from_pretrained( base_model_id, torch_dtype=torch.float16, device_map="auto" ) # Apply LoRA Adapter model = PeftModel.from_pretrained(base_model, peft_model_id) # Example Inference legal_text = "the High Court under sub-section (4) thereof. The expression appeal has not been defined in the CPC..." prompt = f"Predict the legal outcome of this Indian court case. Output the label (0=rejected, 1=accepted) and a brief explanation.\n\nCase: {legal_text}" inputs = tokenizer(prompt, return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu") with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=256, early_stopping=True) print(tokenizer.decode(outputs[0], skip_special_tokens=True))