This model has been finetuned with image and text pairs at 1024px and shows high affinity with limited hallucination on NSFW task.
Full FP32 Training (AdamW NO 8bit Optimizers)
This model has limited video caption ability.
Chat template
Files info
Base model