Subimal10/indian-legal-data-cleaned
Viewer • Updated • 3.02M • 28 • 1
How to use Subimal10/llama3b-legal-sft with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Subimal10/llama3b-legal-sft") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Subimal10/llama3b-legal-sft", dtype="auto")How to use Subimal10/llama3b-legal-sft with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Subimal10/llama3b-legal-sft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Subimal10/llama3b-legal-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/Subimal10/llama3b-legal-sft
How to use Subimal10/llama3b-legal-sft with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Subimal10/llama3b-legal-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Subimal10/llama3b-legal-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Subimal10/llama3b-legal-sft" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Subimal10/llama3b-legal-sft",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use Subimal10/llama3b-legal-sft with Docker Model Runner:
docker model run hf.co/Subimal10/llama3b-legal-sft
Fine-tuned LoRA adapter on Meta Llama-3.2-3B-Instruct, 4-bit quantization
Task: Draft Indian-law documents (eviction notices, affidavits, show-cause notices, leases, POAs, etc.)
meta-llama/Llama-3.2-3B-InstructHashif/indianlegal-llama-2 SFTTrainer, fp16, batch=4→16, max_steps=20 000| Metric | Value |
|---|---|
| Perplexity | 1.53 |
Inference speed on A100: ~0.5 it/s @ bs=1
“✅ Eviction notice generated by this model was reviewed and approved by Advocate Abhishek Chatterjee.”
from transformers import AutoTokenizer, BitsAndBytesConfig, AutoModelForCausalLM
from peft import PeftModel
import os
HF_TOKEN = os.getenv("HF_TOKEN") # or set directly "hf_xxx"
REPO_ID = "Subimal10/llama3b-legal-sft"
# 1️⃣ Load tokenizer + base model in 4-bit + LoRA adapter
tokenizer = AutoTokenizer.from_pretrained(REPO_ID, use_fast=True)
bnb_cfg = BitsAndBytesConfig(load_in_4bit=True)
base = AutoModelForCausalLM.from_pretrained(
REPO_ID,
quantization_config=bnb_cfg,
device_map="auto",
trust_remote_code=True,
token=HF_TOKEN,
)
model = PeftModel.from_pretrained(base, REPO_ID, device_map="auto", token=HF_TOKEN)
model.eval()
# 2️⃣ Inference with an instruction prompt
prompt = (
"<s>[INST] <<SYS>>\n"
"You are a senior contract lawyer.\n"
"<</SYS>>\n\n"
"### Instruction:\n"
"Draft a formal Show Cause Notice under Indian contract law to a contractor for delays in project delivery.\n"
"### Response:\n"
"[/INST] "
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
gen_ids = model.generate(
**inputs,
max_new_tokens=400,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id,
)
completion = tokenizer.decode(gen_ids[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print("=== Show Cause Notice ===\n", completion)
Base model
meta-llama/Llama-3.2-3B-Instruct
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "Subimal10/llama3b-legal-sft"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Subimal10/llama3b-legal-sft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'