Instructions to use gabriellarson/LFM2-VL-450M-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gabriellarson/LFM2-VL-450M-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="gabriellarson/LFM2-VL-450M-GGUF")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("gabriellarson/LFM2-VL-450M-GGUF", dtype="auto")

llama-cpp-python

How to use gabriellarson/LFM2-VL-450M-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="gabriellarson/LFM2-VL-450M-GGUF",
	filename="LFM2-VL-450M-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use gabriellarson/LFM2-VL-450M-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M

Use Docker

docker model run hf.co/gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use gabriellarson/LFM2-VL-450M-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gabriellarson/LFM2-VL-450M-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gabriellarson/LFM2-VL-450M-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M

SGLang

How to use gabriellarson/LFM2-VL-450M-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "gabriellarson/LFM2-VL-450M-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gabriellarson/LFM2-VL-450M-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "gabriellarson/LFM2-VL-450M-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gabriellarson/LFM2-VL-450M-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use gabriellarson/LFM2-VL-450M-GGUF with Ollama:
```
ollama run hf.co/gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M
```

Unsloth Studio new

How to use gabriellarson/LFM2-VL-450M-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gabriellarson/LFM2-VL-450M-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gabriellarson/LFM2-VL-450M-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for gabriellarson/LFM2-VL-450M-GGUF to start chatting

Docker Model Runner
How to use gabriellarson/LFM2-VL-450M-GGUF with Docker Model Runner:
```
docker model run hf.co/gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M
```

Lemonade

How to use gabriellarson/LFM2-VL-450M-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull gabriellarson/LFM2-VL-450M-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LFM2-VL-450M-GGUF-Q4_K_M

List all available models

lemonade list

LFM2‑VL

LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications.

We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters.

2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy
Flexible architecture with user-tunable speed-quality tradeoffs at inference time
Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion

Find more about our vision-language model in the LFM2-VL post and its language backbone in the LFM2 blog post.

📄 Model details

Due to their small size, we recommend fine-tuning LFM2-VL models on narrow use cases to maximize performance. They were trained for instruction following and lightweight agentic flows. Not intended for safety‑critical decisions.

Property	LFM2-VL-450M	LFM2-VL-1.6B
Parameters (LM only)	350M	1.2B
Vision encoder	SigLIP2 NaFlex base (86M)	SigLIP2 NaFlex shape‑optimized (400M)
Backbone layers	hybrid conv+attention	hybrid conv+attention
Context (text)	32,768 tokens	32,768 tokens
Image tokens	dynamic, user‑tunable	dynamic, user‑tunable
Vocab size	65,536	65,536
Precision	bfloat16	bfloat16
License	LFM Open License v1.0	LFM Open License v1.0

Supported languages: English

Generation parameters: We recommend the following parameters:

Text: temperature=0.1, min_p=0.15, repetition_penalty=1.05
Vision: min_image_tokens=64 max_image_tokens=256, do_image_splitting=True

Chat template: LFM2-VL uses a ChatML-like chat template as follows:

<|startoftext|><|im_start|>system
You are a helpful multimodal assistant by Liquid AI.<|im_end|>
<|im_start|>user
<image>Describe this image.<|im_end|>
<|im_start|>assistant
This image shows a Caenorhabditis elegans (C. elegans) nematode.<|im_end|>

Images are referenced with a sentinel (<image>), which is automatically replaced with the image tokens by the processor.

You can apply it using the dedicated .apply_chat_template() function from Hugging Face transformers.

Architecture

Hybrid backbone: Language model tower (LFM2-1.2B or LFM2-350M) paired with SigLIP2 NaFlex vision encoders (400M shape-optimized or 86M base variant)
Native resolution processing: Handles images up to 512×512 pixels without upscaling and preserves non-standard aspect ratios without distortion
Tiling strategy: Splits large images into non-overlapping 512×512 patches and includes thumbnail encoding for global context (in 1.6B model)
Efficient token mapping: 2-layer MLP connector with pixel unshuffle reduces image tokens (e.g., 256×384 image → 96 tokens, 1000×3000 → 1,020 tokens)
Inference-time flexibility: User-tunable maximum image tokens and patch count for speed/quality tradeoff without retraining

Training approach

Builds on the LFM2 base model with joint mid-training that fuses vision and language capabilities using a gradually adjusted text-to-image ratio
Applies joint SFT with emphasis on image understanding and vision tasks
Leverages large-scale open-source datasets combined with in-house synthetic vision data, selected for balanced task coverage
Follows a progressive training strategy: base model → joint mid-training → supervised fine-tuning

🏃 How to run LFM2-VL

You can run LFM2-VL with Hugging Face transformers v4.55 or more recent as follows:

pip install -U transformers pillow

Here is an example of how to generate an answer with transformers in Python:

from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image
# Load model and processor
model_id = "LiquidAI/LFM2-VL-450M"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
# Load image and create conversation
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
image = load_image(url)
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What is in this image?"},
        ],
    },
]
# Generate Answer
inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64)
processor.batch_decode(outputs, skip_special_tokens=True)[0]
# This image depicts a vibrant street scene in what appears to be a Chinatown or similar cultural area. The focal point is a large red stop sign with white lettering, mounted on a pole.

You can directly run and test the model with this Colab notebook.

🔧 How to fine-tune

We recommend fine-tuning LFM2-VL models on your use cases to maximize performance.

Notebook	Description	Link
SFT (TRL)	Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL.

📈 Performance

Model	RealWorldQA	MM-IFEval	InfoVQA (Val)	OCRBench	BLINK	MMStar	MMMU (Val)	MathVista	SEEDBench_IMG	MMVet	MME	MMLU
InternVL3-2B	65.10	38.49	66.10	831	53.10	61.10	48.70	57.60	75.00	67.00	2186.40	64.80
InternVL3-1B	57.00	31.14	54.94	798	43.00	52.30	43.20	46.90	71.20	58.70	1912.40	49.80
SmolVLM2-2.2B	57.50	19.42	37.75	725	42.30	46.00	41.60	51.50	71.30	34.90	1792.50	-
LFM2-VL-1.6B	65.23	37.66	58.68	742	44.40	49.53	38.44	51.10	71.97	48.07	1753.04	50.99
Model	RealWorldQA	MM-IFEval	InfoVQA (Val)	OCRBench	BLINK	MMStar	MMMU (Val)	MathVista	SEEDBench_IMG	MMVet	MME	MMLU
-------------------	-------------	-----------	---------------	----------	-------	--------	------------	-----------	---------------	-------	----------	-------
SmolVLM2-500M	49.90	11.27	24.64	609	40.70	38.20	34.10	37.50	62.20	29.90	1448.30	-
LFM2-VL-450M	52.29	26.18	46.51	655	41.98	40.87	33.11	44.70	63.50	33.76	1239.06	40.16

We obtained MM-IFEval and InfoVQA (Val) scores for InternVL 3 and SmolVLM2 models using VLMEvalKit.

📬 Contact

If you are interested in custom solutions with edge deployment, please contact our sales team.

Downloads last month: 68

GGUF

Model size

0.4B params

Architecture

lfm2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for gabriellarson/LFM2-VL-450M-GGUF

Base model

LiquidAI/LFM2-VL-450M

Quantized

(15)

this model