How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="g023/Holo-3.1-4B-GGUF",
	filename="Holo-3.1-4B-GGUF-Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

GGUF for Holo3.1

@misc{hai2026holo31,
      title={Holo3.1: Fast & Local Computer Use Agents},
      author={H Company},
      year={2026},
      url={https://huggingface.co/Hcompany/Holo3.1-35B-A3B},
}
Downloads last month
129
GGUF
Model size
5B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for g023/Holo-3.1-4B-GGUF

Finetuned
Qwen/Qwen3.5-4B
Quantized
(281)
this model