---
base_model: meta-llama/Llama-3.1-8B
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- grit
- ner
- information-extraction
- transformers
---

## Pritish92/ner-grit-llama31-8b-lora-best

This is a **GRIT + LoRA adapter** fine-tuned from **`meta-llama/Llama-3.1-8B`** to do **instruction-following NER-style extraction** into a strict JSON list format:

```json
[{"label":"...","text":"..."}]
```

**Note:** This repository contains **adapter weights only** (not the full base model weights). You must have access to `meta-llama/Llama-3.1-8B` on Hugging Face to run it.

## Prompt format (exact)

```text
### Instruction:
{instruction}
Maintain the JSON key order exactly as shown.
Output format: [{"label":"...","text":"..."}]

### Input:
{input_chunk}

### Response:

```

## How to load

```python
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

adapter_id = "Pritish92/ner-grit-llama31-8b-lora-best"
tokenizer = AutoTokenizer.from_pretrained(adapter_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
tokenizer.truncation_side = "left"

model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()
```

## Training details

- **Date**: 2026-01-02
- **Sequence length cap (`max_length`)**: 20
- **Chunking strategy**: token_overlap
  - prompt overhead tokens reserved: 256
  - output overhead tokens reserved: 1024
  - max input chunk tokens: 2048
  - overlap chunk tokens: 256
  - min chunk tokens: 256
- **Batch size**: 1
- **Gradient accumulation**: 8 (effective batch: 8)
- **Learning rate**: 5e-05
- **Planned epochs**: 2 (early stopping may stop sooner)
- **Loss masking**: response-only (prompt + input chunk tokens masked with -100)

### LoRA / PEFT

- **LoRA rank (r)**: 16
- **LoRA alpha**: 32
- **LoRA dropout**: 0.1
- **Target modules**: up_proj, v_proj, down_proj, o_proj, k_proj, gate_proj, q_proj

### GRIT hyperparameters

- **kfac_min_samples**: 256
- **kfac_update_freq**: 100
- **kfac_damping**: 0.005
- **reprojection_warmup_steps**: 500
- **reprojection_freq**: 100
- **use_two_sided_reprojection**: True
- **rank_adaptation_start_step**: 500
- **rank_adaptation_threshold**: 0.85
- **ng_warmup_steps**: 300
- **regularizer_warmup_steps**: 500
- **lambda_kfac**: 1e-05
- **lambda_reproj**: 0.0001

## Training data

Local CSVs:
- `NER/NER-Data/ner_train_dataset.csv`
- `NER/NER-Data/ner_dev_dataset.csv`
- `NER/NER-Data/ner_test_dataset.csv`

**Example counts:** raw train=18,115, raw val=2,010; after chunking train examples=24,620

## Evaluation

- **Best checkpoint metric**: eval_entity_f1=0.187876 (best checkpoint: step 3078)
- **Train runtime**: 34690.8s (9h 38m 10s)
- **eval_entity_f1**: 0.187876
- **eval_entity_micro_f1**: 0.175875
- **eval_entity_parse_fail_rate**: 0.651071
- **eval_entity_precision**: 0.291457
- **eval_entity_recall**: 0.167590
- **eval_loss**: 0.138082
- **eval_runtime**: 22803.049000
- **eval_samples_per_second**: 0.123000
- **eval_steps_per_second**: 0.031000

## Limitations / notes

- Outputs are **not guaranteed** to be valid JSON; validate/parse and handle failures robustly.
- Model performance depends on the entity schema/labels in your training data.
- If `meta-llama/Llama-3.1-8B` is gated, you must authenticate to download it.