Instructions to use Pritish92/ner-grit-llama31-8b-lora-latest with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Pritish92/ner-grit-llama31-8b-lora-latest with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B") model = PeftModel.from_pretrained(base_model, "Pritish92/ner-grit-llama31-8b-lora-latest") - Transformers
How to use Pritish92/ner-grit-llama31-8b-lora-latest with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Pritish92/ner-grit-llama31-8b-lora-latest")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Pritish92/ner-grit-llama31-8b-lora-latest", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Pritish92/ner-grit-llama31-8b-lora-latest with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Pritish92/ner-grit-llama31-8b-lora-latest" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pritish92/ner-grit-llama31-8b-lora-latest", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Pritish92/ner-grit-llama31-8b-lora-latest
- SGLang
How to use Pritish92/ner-grit-llama31-8b-lora-latest with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Pritish92/ner-grit-llama31-8b-lora-latest" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pritish92/ner-grit-llama31-8b-lora-latest", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Pritish92/ner-grit-llama31-8b-lora-latest" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Pritish92/ner-grit-llama31-8b-lora-latest", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Pritish92/ner-grit-llama31-8b-lora-latest with Docker Model Runner:
docker model run hf.co/Pritish92/ner-grit-llama31-8b-lora-latest
Pritish92/ner-grit-llama31-8b-lora-latest
This is a GRIT + LoRA adapter fine-tuned from meta-llama/Llama-3.1-8B to do instruction-following NER-style extraction into a strict JSON list format:
[{"label":"...","text":"..."}]
Note: This repository contains adapter weights only (not the full base model weights). You must have access to meta-llama/Llama-3.1-8B on Hugging Face to run it.
Prompt format (exact)
### Instruction:
{instruction}
Maintain the JSON key order exactly as shown.
Output format: [{"label":"...","text":"..."}]
### Input:
{input_chunk}
### Response:
How to load
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
adapter_id = "Pritish92/ner-grit-llama31-8b-lora-latest"
tokenizer = AutoTokenizer.from_pretrained(adapter_id, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
tokenizer.truncation_side = "left"
model = AutoPeftModelForCausalLM.from_pretrained(
adapter_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
Training details
- Date: 2026-01-02
- Sequence length cap (
max_length): 20 - Chunking strategy: token_overlap
- prompt overhead tokens reserved: 256
- output overhead tokens reserved: 1024
- max input chunk tokens: 2048
- overlap chunk tokens: 256
- min chunk tokens: 256
- Batch size: 1
- Gradient accumulation: 8 (effective batch: 8)
- Learning rate: 5e-05
- Planned epochs: 2 (early stopping may stop sooner)
- Loss masking: response-only (prompt + input chunk tokens masked with -100)
LoRA / PEFT
- LoRA rank (r): 16
- LoRA alpha: 32
- LoRA dropout: 0.1
- Target modules: up_proj, k_proj, gate_proj, o_proj, q_proj, v_proj, down_proj
GRIT hyperparameters
- kfac_min_samples: 256
- kfac_update_freq: 100
- kfac_damping: 0.005
- reprojection_warmup_steps: 500
- reprojection_freq: 100
- use_two_sided_reprojection: True
- rank_adaptation_start_step: 500
- rank_adaptation_threshold: 0.85
- ng_warmup_steps: 300
- regularizer_warmup_steps: 500
- lambda_kfac: 1e-05
- lambda_reproj: 0.0001
Training data
Local CSVs:
NER/NER-Data/ner_train_dataset.csvNER/NER-Data/ner_dev_dataset.csvNER/NER-Data/ner_test_dataset.csv
Example counts: raw train=18,115, raw val=2,010; after chunking train examples=24,620
Evaluation
- Best checkpoint metric: N/A
- Train runtime: 34690.8s (9h 38m 10s)
- eval_entity_f1: 0.173705
- eval_entity_micro_f1: 0.162234
- eval_entity_parse_fail_rate: 0.686071
- eval_entity_precision: 0.270288
- eval_entity_recall: 0.155745
- eval_loss: 0.198197
- eval_runtime: 23856.426600
- eval_samples_per_second: 0.117000
- eval_steps_per_second: 0.029000
Limitations / notes
- Outputs are not guaranteed to be valid JSON; validate/parse and handle failures robustly.
- Model performance depends on the entity schema/labels in your training data.
- If
meta-llama/Llama-3.1-8Bis gated, you must authenticate to download it.
- Downloads last month
- 1
Model tree for Pritish92/ner-grit-llama31-8b-lora-latest
Base model
meta-llama/Llama-3.1-8B