Text-to-Image
Diffusers
Safetensors
English
Chinese
QwenImagePipeline
nf4
Abliterated
Qwen2.5-VL7b-Abliterated
instruct
Diffusers
Transformers
uncensored
image-to-image
image-generation
Instructions to use lhca521/QWEN_IMAGE_nf4_w_AbliteratedTE_Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use lhca521/QWEN_IMAGE_nf4_w_AbliteratedTE_Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("lhca521/QWEN_IMAGE_nf4_w_AbliteratedTE_Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| language: | |
| - en | |
| - zh | |
| - th | |
| base_model: | |
| - Qwen/Qwen2.5-VL-7B-Instruct | |
| tags: | |
| - text-generation-inference | |
| - uncensored | |
| - image-captioning | |
| - vlm | |
| - visual-understanding | |
| - caption | |
| - image-to-text | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| datasets: | |
| - prithivMLmods/blip3o-caption-mini-arrow | |
| - prithivMLmods/Caption3o-Opt-v2 | |
|  | |
| # **Qwen2.5-VL-7B-Abliterated-Caption-it** | |
| > The **Qwen2.5-VL-7B-Abliterated-Caption-it** model is a fine-tuned version of **Qwen2.5-VL-7B-Instruct**, tailored for **Abliterated Captioning** / **Uncensored Image Captioning**. This variant is designed to generate highly detailed and descriptive captions across a broad range of visual categories, including images with complex, sensitive, or nuanced content—across varying aspect ratios and resolutions. | |
| # Key Highlights | |
| * **Abliterated / Uncensored Captioning**: Fine-tuned to bypass common content filters while preserving factual and descriptive richness across diverse visual categories. | |
| * **High-Fidelity Descriptions**: Generates comprehensive captions for general, artistic, technical, abstract, and low-context images. | |
| * **Robust Across Aspect Ratios**: Capable of accurately captioning images with wide, tall, square, and irregular dimensions. | |
| * **Variational Detail Control**: Produces outputs with both high-level summaries and fine-grained descriptions as needed. | |
| * **Foundation on Qwen2.5-VL Architecture**: Leverages the strengths of the Qwen2.5-VL-7B multimodal model for visual reasoning, comprehension, and instruction-following. | |
| * **Multilingual Output Capability**: Can support multilingual descriptions (English as default), adaptable via prompt engineering. | |
| # Training Details | |
| This model was fine-tuned using the following datasets: | |
| * **[prithivMLmods/blip3o-caption-mini-arrow](https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow)** | |
| * **[prithivMLmods/Caption3o-Opt-v2](https://huggingface.co/datasets/prithivMLmods/Caption3o-Opt-v2)** | |
| * **Private/unlisted datasets** curated for uncensored and domain-specific image captioning tasks. | |
| The training objective focused on enhancing performance in unconstrained, descriptive image captioning—especially for edge cases commonly filtered out in standard captioning benchmarks. | |
| # Quick Start with Transformers | |
| > [!note] | |
| Instruction Query: Provide a detailed caption for the image | |
| ```python | |
| from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor | |
| from qwen_vl_utils import process_vision_info | |
| model = Qwen2_5_VLForConditionalGeneration.from_pretrained( | |
| "prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it", torch_dtype="auto", device_map="auto" | |
| ) | |
| processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen2.5-VL-7B-Abliterated-Caption-it") | |
| messages = [ | |
| { | |
| "role": "user", | |
| "content": [ | |
| { | |
| "type": "image", | |
| "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg", | |
| }, | |
| {"type": "text", "text": "Describe this image in detail."}, | |
| ], | |
| } | |
| ] | |
| text = processor.apply_chat_template( | |
| messages, tokenize=False, add_generation_prompt=True | |
| ) | |
| image_inputs, video_inputs = process_vision_info(messages) | |
| inputs = processor( | |
| text=[text], | |
| images=image_inputs, | |
| videos=video_inputs, | |
| padding=True, | |
| return_tensors="pt", | |
| ) | |
| inputs = inputs.to("cuda") | |
| generated_ids = model.generate(**inputs, max_new_tokens=128) | |
| generated_ids_trimmed = [ | |
| out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) | |
| ] | |
| output_text = processor.batch_decode( | |
| generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False | |
| ) | |
| print(output_text) | |
| ``` | |
| # Intended Use | |
| This model is suited for: | |
| * Generating detailed and unfiltered image captions for general-purpose or artistic datasets. | |
| * Content moderation research, red-teaming, and generative safety evaluations. | |
| * Enabling descriptive captioning for visual datasets typically excluded from mainstream models. | |
| * Use in creative applications (e.g., storytelling, art generation) that benefit from rich descriptive captions. | |
| * Captioning for non-standard aspect ratios and stylized visual content. | |
| # Limitations | |
| * May produce explicit, sensitive, or offensive descriptions depending on image content and prompts. | |
| * Not suitable for deployment in production systems requiring content filtering or moderation. | |
| * Can exhibit variability in caption tone or style depending on input prompt phrasing. | |
| * Accuracy for unfamiliar or synthetic visual styles may vary. |