| --- |
| license: gpl-3.0 |
| language: |
| - en |
| base_model: |
| - lmms-lab/llava-onevision-qwen2-7b-ov |
| pipeline_tag: image-text-to-text |
| tags: |
| - radioastronomy |
| --- |
| # radiollava-7b-qacapt |
|
|
| https://arxiv.org/abs/2503.23859 |
|
|
| radiollava is a domain-specialized vision-language AI assistant tailored for research in radioastronomy, in particular for running |
| radio source analysis tasks on radio-continuum images. It was trained on ~1.5M user-assistant conversations relative to ~55k radio |
| images taken from various radio surveys, including ASKAP-EMU, MeerKAT SMGPS and VLA FIRST, and on a set of ~38k image-caption pairs extracted |
| from arXiv papers (2000-2025) with keywords on radioastronomical topics and techniques. |
|
|
| ## Model Details |
|
|
| - **Base Architecture**: llava-onevision |
| - **Base Model**: llava-onevision-qwen2-7b-ov |
| - **Parameters**: 7 billion |
| - **Domain**: Radio Astronomy |
| - **License**: GPL 3.0 License |
| - **Development Process**: Supervised Fine-tuning (SFT) on QA pairs |
|
|
| ## Using the model |
| To use this model, you need to install LLaVA-NeXT as described in this repository: |
|
|
| `https://github.com/LLaVA-VL/LLaVA-NeXT` |
|
|
| LLaVA-NeXT requires an outdated version of the `transformers` library (v4.40.0). |
|
|
| To load the model: |
|
|
| ```python |
| from llava.model.builder import load_pretrained_model |
| tokenizer, model, image_processor, max_length = load_pretrained_model( |
| model_name_or_path="inaf-oact-ai/radiollava-7b-qacapt", |
| model_base=None, |
| model_name="llava_qwen", |
| device_map="auto" |
| ) |
| ``` |
|
|
| To run model inference on an input image: |
|
|
| ```python |
| import torch |
| from PIL import Image |
| from llava.model.builder import load_pretrained_model |
| from llava.mm_utils import process_images, tokenizer_image_token |
| from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN |
| from llava.conversation import conv_templates |
| # - Load model |
| tokenizer, model, image_processor, max_length = load_pretrained_model( |
| model_name_or_path="inaf-oact-ai/radiollava-7b-qa", |
| model_base=None, |
| model_name="llava_qwen", |
| device_map="auto" |
| ) |
| # - Load image |
| image_path= ... |
| image= Image.fromarray(data).convert("RGB") |
| # - Process image |
| image_tensor = process_images([image], image_processor, model.config) |
| image_tensor = [_image.to(dtype=torch.float16, device=model.device) for _image in image_tensor] |
| # - Create prompt |
| query= "Describe the input image" # Replace it with your query |
| question = DEFAULT_IMAGE_TOKEN + "\n" + query |
| conv = copy.deepcopy(conv_templates[conv_template]) |
| conv.system= '<|im_start|>system\nYou are an AI assistant specialized in radio astronomical topics.' |
| conv.append_message(conv.roles[0], question) |
| conv.append_message(conv.roles[1], None) |
| prompt_question = conv.get_prompt() |
| # - Create model inputs |
| input_ids = tokenizer_image_token(prompt_question, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(model.device) |
| image_sizes = [image.size] |
| # - Generate model response |
| # Change generation parameters as you wish |
| do_sample=True |
| temperature= 0.3 |
| max_new_tokens=4096 |
| output = model.generate( |
| input_ids, |
| images=image_tensor, |
| image_sizes=image_sizes, |
| do_sample=do_sample, |
| temperature=temperature if do_sample else None, |
| max_new_tokens=max_new_tokens, |
| ) |
| output_parsed= tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=False) |
| |
| # - Process response as you wish ... |
| #response= output_parsed.strip("\n").strip() |
| ``` |
|
|
| See the tutorials available in the LLaVA-NeXT repository: |
|
|
| `https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_OneVision_Tutorials.ipynb` |
|
|
| Further usage examples are provided in this repository: |
|
|
| `https://github.com/SKA-INAF/radio-llava.git` |