--- { "language": ["en"], "license": "llama2", "tags": [ "text-generation", "causal-lm", "supervised-fine-tuning", "instruction-tuning", "synthetic-qa", "lora", "axolotl", "deepspeed", "transformers", "llava", "eu-hpc" ], "datasets": [ "axolotl_deduplicated_synthetic_qa" ], "metrics": [ "loss" ], "library_name": "transformers", "framework": "pytorch", "base_model": "llava-hf/llava-1.5-7b-hf", "model_name": "llava-7b-sft", "pipeline_tag": "text-generation", "task_categories": ["text-generation", "question-answering"], "model_type": "llava", "inference": { "parameters": { "max_new_tokens": 512, "temperature": 0.7, "top_p": 0.9 } }, "trained_on": [ "Leonardo EuroHPC" ], "description": "Supervised fine-tuning (SFT) of LLaVA 1.5 7B on synthetic QA pairs using Axolotl and DeepSpeed ZeRO-1. The model improves text-based question answering and instruction following while preserving its multimodal capabilities." } --- # LLaVA 7B — Supervised Fine-Tuning (SFT) on Synthetic QA **Model type:** Vision-Language Causal Model (text-finetuned LLaVA-1.5) **Base model:** [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) **License:** Llama 2 Community License **Framework:** Axolotl + DeepSpeed ZeRO-1 (PyTorch 2.5.1 + CUDA 12.1) --- ## Overview `llava-7b-sft` is a **supervised fine-tuned** version of **LLaVA 1.5 7B**, trained on a synthetic instruction-following dataset of **question–answer pairs** to enhance text understanding and reasoning. Although derived from a multimodal base, this SFT run fine-tunes the **language model component** using LoRA adapters which were later **merged into the full model weights**. This model therefore supports **text-only generation** natively (without PEFT) and retains compatibility with the **multimodal processor and vision configuration** from LLaVA. Training was conducted on the **Leonardo EuroHPC** system using **Axolotl** and **DeepSpeed ZeRO-1**. --- ## Training Setup | Component | Specification | |:-----------|:--------------| | **Objective** | Supervised fine-tuning (instruction-following QA) | | **Adapter type** | LoRA (merged into full model) | | **Precision** | bfloat16 | | **Hardware** | 8 nodes × 2 × NVIDIA A100 64 GB GPUs | | **Framework** | Axolotl 0.6 + DeepSpeed ZeRO-1 (PyTorch 2.5.1 + CUDA 12.1) | | **Runtime** | ~24 hours | | **Checkpoints** | 2 per epoch | | **Vision tower** | Frozen during SFT | | **Dataset split** | 70% train / 30% validation | --- ## Dataset **Name:** `axolotl_deduplicated_synthetic_qa.jsonl` **Type:** Instruction-following synthetic QA dataset (Alpaca-style) Each record contains a single-turn question and a high-quality generated answer. This SFT data improves the model’s **reasoning**, **language coherence**, and **conversational QA** quality. --- ## Hyperparameters | Parameter | Value | |:-----------|:------| | Sequence length | 2048 | | Micro batch size | 1 | | Gradient accumulation | 4 | | Epochs | 1 | | Learning rate | 0.0002 | | LR scheduler | cosine | | Optimizer | AdamW (8-bit) | | Warmup steps | 10 | | Weight decay | 0.0 | | LoRA rank (r) | 16 | | LoRA alpha | 32 | | LoRA dropout | 0.05 | | LoRA target modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` | | Gradient checkpointing | ✅ | | Flash attention | ✅ | | Validation set size | 0.3 | | Evals per epoch | 2 | --- ## Tokenizer & Processor | Component | Description | |:-----------|:-------------| | **Tokenizer type** | `AutoTokenizer` | | **Processor type** | `AutoProcessor` (compatible with LLaVA image+text inputs) | | **Pad token** | `` (ID 32001) | | **Chat template** | `llava` | The processor configuration allows image or text inputs; however, this release focuses on text-based supervised tuning. --- ## Files Included This repository contains the **fully merged model weights** and all required configs for direct use with `transformers`: - `config.json` - `model-*.safetensors` - `tokenizer.json` - `tokenizer_config.json` - `tokenizer.model` - `special_tokens_map.json` - `processor_config.json` - `preprocessor_config.json` - `vision_config.json` - `image_processor_config.json` - `README.md` --- ## Usage Example To run text-based generation with this model: ```python import torch from transformers import AutoProcessor, AutoModelForCausalLM model_id = "ubitech-edg/llava-7b-sft" processor = AutoProcessor.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) prompt = "USER: Explain the principle of energy conservation.\nASSISTANT:" inputs = processor(text=prompt, return_tensors="pt").to("cuda") with torch.inference_mode(): outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9) print(processor.decode(outputs[0], skip_special_tokens=True)) ```