--- base_model: aitf-komdigi/KomdigiITS-8B-DFK-CPT library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:aitf-komdigi/KomdigiITS-8B-DFK-CPT - lora - sft - transformers - trl - unsloth - dfk-detection - vlm - indonesian - multimodal - image-classification - content-moderation ---
A LoRA adapter fine-tuned on aitf-komdigi/KomdigiITS-8B-DFK-CPT (Ministral-3-8B-Base-2512 based) as a Vision-Language Model for multimodal content classification. The model analyzes social media screenshots and classifies them into four categories: netral, disinformasi, fitnah, and ujaran kebencian.
Trained using the SITA framework with Unsloth's SFT pipeline. Given an image, the model produces a structured analysis with a classification label and a detailed Indonesian-language reasoning of any violations found.
final-ministral-8b-cpt-ws3), trained on the DFK VLM Dataset V3 with augmented train/val splits. The base model (aitf-komdigi/KomdigiITS-8B-DFK-CPT) was continual-pretrained on DFK domain-oriented text before fine-tuning.
netral, disinformasi, fitnah, or ujaran kebencian) and a detailed reasoning in Indonesian.Evaluated on the held-out validation split using greedy decoding (temperature=0.0) and BERTScore (bert-base-multilingual-cased).
dfk_vlm_dataset_v3 (augmented on fitnah class)unsloth_vlm_sft (Unsloth VLM SFT trainer)[INST][/INST]eval_loss (lower is better)Each sample is formatted as a multi-turn conversation using the ministral_3 chat template. The dataset builds structured content blocks which the Jinja template renders as:
<s>[SYSTEM_PROMPT]...default Ministral system prompt...[/SYSTEM_PROMPT][INST]Anda adalah seorang analis konten media sosial ahli. Diberikan tangkapan layar dari sebuah konten, tentukan label kategori pelanggaran dan berikan analisis detail mengenai pelanggaran yang ditemukan.Ringkasan: {ringkasan}
Klaim: {klaim}
Fakta: {fakta}[IMG][/INST]Label: {label}
Analisis: {analisis}</s>
"Tidak ditemukan sumber yang valid."netral, disinformasi, fitnah, or ujaran kebencian.experiment_name: final-ministral-8b-cpt-ws3
seed: 3407
reporting:
wandb: true
wandb_project: "DFK3"
model:
name: unsloth_vlm
pretrained: aitf-komdigi/KomdigiITS-8B-DFK-CPT
kwargs:
load_in_4bit: false
chat_template: "sita/templates/ministral_3.jinja"
adapter:
name: unsloth_vlm_lora
kwargs:
finetune_vision_layers: true
finetune_language_layers: true
finetune_attention_modules: true
finetune_mlp_modules: true
r: 16
lora_alpha: 16
lora_dropout: 0.1
bias: "none"
target_modules: "all-linear"
use_gradient_checkpointing: "unsloth"
random_state: 3407
dataset:
name: dfk_vlm_dataset_v3
kwargs:
data_dir: /content/dataset/images/images
training:
num_epochs: 3
batch_size: 4
learning_rate: 5e-4
gradient_accumulation_steps: 4
max_grad_norm: 1
warmup_ratio: 0.03
weight_decay: 0
logging_steps: 1
eval_steps: 250
extra:
seed: 3407
max_length: 4096
load_best_model_at_end: true
metric_for_best_model: eval_loss
greater_is_better: false
trainer:
name: unsloth_vlm_sft
kwargs:
train_on_responses_only: true
instruction_part: "[INST]"
response_part: "[/INST]"
optim: adamw_8bit
evaluation:
name: vlm_gen
kwargs:
max_new_tokens: 512
temperature: 0.0
bert_model: bert-base-multilingual-cased
batch_size: 16
num_workers: 11