---
{
  "language": ["en"],
  "license": "llama2",
  "tags": [
    "text-generation",
    "causal-lm",
    "supervised-fine-tuning",
    "instruction-tuning",
    "synthetic-qa",
    "lora",
    "axolotl",
    "deepspeed",
    "transformers",
    "llava",
    "eu-hpc"
  ],
  "datasets": [
    "axolotl_deduplicated_synthetic_qa"
  ],
  "metrics": [
    "loss"
  ],
  "library_name": "transformers",
  "framework": "pytorch",
  "base_model": "llava-hf/llava-1.5-7b-hf",
  "model_name": "llava-7b-sft",
  "pipeline_tag": "text-generation",
  "task_categories": ["text-generation", "question-answering"],
  "model_type": "llava",
  "inference": {
    "parameters": {
      "max_new_tokens": 512,
      "temperature": 0.7,
      "top_p": 0.9
    }
  },
  "trained_on": [
    "Leonardo EuroHPC"
  ],
  "description": "Supervised fine-tuning (SFT) of LLaVA 1.5 7B on synthetic QA pairs using Axolotl and DeepSpeed ZeRO-1. The model improves text-based question answering and instruction following while preserving its multimodal capabilities."
}
---

# LLaVA 7B — Supervised Fine-Tuning (SFT) on Synthetic QA

**Model type:** Vision-Language Causal Model (text-finetuned LLaVA-1.5)  
**Base model:** [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf)  
**License:** Llama 2 Community License  
**Framework:** Axolotl + DeepSpeed ZeRO-1 (PyTorch 2.5.1 + CUDA 12.1)

---

## Overview

`llava-7b-sft` is a **supervised fine-tuned** version of **LLaVA 1.5 7B**, trained on a synthetic instruction-following dataset of **question–answer pairs** to enhance text understanding and reasoning.  
Although derived from a multimodal base, this SFT run fine-tunes the **language model component** using LoRA adapters which were later **merged into the full model weights**.

This model therefore supports **text-only generation** natively (without PEFT) and retains compatibility with the **multimodal processor and vision configuration** from LLaVA.

Training was conducted on the **Leonardo EuroHPC** system using **Axolotl** and **DeepSpeed ZeRO-1**.

---

## Training Setup

| Component | Specification |
|:-----------|:--------------|
| **Objective** | Supervised fine-tuning (instruction-following QA) |
| **Adapter type** | LoRA (merged into full model) |
| **Precision** | bfloat16 |
| **Hardware** | 8 nodes × 2 × NVIDIA A100 64 GB GPUs |
| **Framework** | Axolotl 0.6 + DeepSpeed ZeRO-1 (PyTorch 2.5.1 + CUDA 12.1) |
| **Runtime** | ~24 hours |
| **Checkpoints** | 2 per epoch |
| **Vision tower** | Frozen during SFT |
| **Dataset split** | 70% train / 30% validation |

---

## Dataset

**Name:** `axolotl_deduplicated_synthetic_qa.jsonl`  
**Type:** Instruction-following synthetic QA dataset (Alpaca-style)  

Each record contains a single-turn question and a high-quality generated answer.  
This SFT data improves the model’s **reasoning**, **language coherence**, and **conversational QA** quality.

---

## Hyperparameters

| Parameter | Value |
|:-----------|:------|
| Sequence length | 2048 |
| Micro batch size | 1 |
| Gradient accumulation | 4 |
| Epochs | 1 |
| Learning rate | 0.0002 |
| LR scheduler | cosine |
| Optimizer | AdamW (8-bit) |
| Warmup steps | 10 |
| Weight decay | 0.0 |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| LoRA target modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` |
| Gradient checkpointing | ✅ |
| Flash attention | ✅ |
| Validation set size | 0.3 |
| Evals per epoch | 2 |

---

## Tokenizer & Processor

| Component | Description |
|:-----------|:-------------|
| **Tokenizer type** | `AutoTokenizer` |
| **Processor type** | `AutoProcessor` (compatible with LLaVA image+text inputs) |
| **Pad token** | `<pad>` (ID 32001) |
| **Chat template** | `llava` |

The processor configuration allows image or text inputs; however, this release focuses on text-based supervised tuning.

---

## Files Included

This repository contains the **fully merged model weights** and all required configs for direct use with `transformers`:

- `config.json`  
- `model-*.safetensors`  
- `tokenizer.json`  
- `tokenizer_config.json`  
- `tokenizer.model`  
- `special_tokens_map.json`  
- `processor_config.json`  
- `preprocessor_config.json`  
- `vision_config.json`  
- `image_processor_config.json`  
- `README.md`

---

## Usage Example

To run text-based generation with this model:

```python
import torch
from transformers import AutoProcessor, AutoModelForCausalLM

model_id = "ubitech-edg/llava-7b-sft"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "USER: Explain the principle of energy conservation.\nASSISTANT:"
inputs = processor(text=prompt, return_tensors="pt").to("cuda")

with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)

print(processor.decode(outputs[0], skip_special_tokens=True))
```