openai/gdpval
Viewer โข Updated โข 220 โข 71.8k โข 506
How to use Luke-Bergen/Mineral-Nano-1 with fastText:
from huggingface_hub import hf_hub_download
import fasttext
model = fasttext.load_model(hf_hub_download("Luke-Bergen/Mineral-Nano-1", "model.bin"))Mineral Nano 1 Vision is a compact, efficient vision-language model designed for fast inference and low-resource environments with multimodal capabilities.
pip install transformers pillow torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
import requests
model_name = "Luke-Bergen/mineral-nano-1"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(model_name)
# Load an image
url = "https://example.com/image.jpg"
image = Image.open(requests.get(url, stream=True).raw)
# Prepare inputs
prompt = "<image>What is in this image?"
inputs = processor(text=prompt, images=image, return_tensors="pt")
# Generate response
outputs = model.generate(**inputs, max_new_tokens=100)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
from PIL import Image
images = [
Image.open("image1.jpg"),
Image.open("image2.jpg")
]
prompt = "<image>Describe the first image. <image>Now describe the second image."
inputs = processor(text=prompt, images=images, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(processor.decode(outputs[0], skip_special_tokens=True))
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What objects are in this image?"}
]
}
]
# Apply chat template
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
print(processor.decode(outputs[0], skip_special_tokens=True))
from PIL import Image
# Load local image
image = Image.open("path/to/your/image.jpg")
prompt = "<image>Describe what you see in detail."
inputs = processor(text=prompt, images=image, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(outputs[0], skip_special_tokens=True))
โ Image description and captioning โ Visual question answering โ Object detection and recognition โ Scene understanding โ Multi-image reasoning โ OCR and text extraction from images
This model is designed for:
Images are automatically:
use_cache=True for faster generation[Specify your license - e.g., MIT, Apache 2.0, etc.]
@misc{mineral-nano-1-vision,
author = {Luke Bergen},
title = {Mineral Nano 1 Vision: A Compact Vision-Language Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Luke-Bergen/mineral-nano-1}
}
For questions or issues, please open an issue on the model repository.
This model builds upon research in vision transformers and multimodal learning.
Base model
deepseek-ai/DeepSeek-OCR