How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for KavinduHansaka/granite-4.0-h-tiny-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for KavinduHansaka/granite-4.0-h-tiny-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for KavinduHansaka/granite-4.0-h-tiny-gguf to start chatting
Quick Links

granite-4.0-h-tiny-gguf

๐Ÿš€ GGUF-quantized release of IBM Granite 4.0 H Tiny, converted for efficient local inference.

This repository provides GGUF-format versions of the original
ibm-granite/granite-4.0-h-tiny model, enabling fast execution with llama.cpp, Ollama, LM Studio, and other GGUF-compatible runtimes.


Model Overview

  • Base Model: IBM Granite 4.0 H Tiny
  • Model Type: Decoder-only Transformer
  • Format: GGUF
  • Pipeline: Text Generation
  • Language: English

Designed for lightweight, low-latency inference while preserving strong instruction-following capabilities.


What This Repository Contains

  • โœ… GGUF-converted model weights
  • โœ… Quantized for CPU / low-VRAM environments
  • โœ… Ready for local & edge inference
  • โŒ No fine-tuning or weight modification (format conversion only)

Usage

Ollama

Create a Modelfile:

FROM granite-4.0-h-tiny.gguf

Run:

ollama create granite-tiny -f Modelfile
ollama run granite-tiny

LM Studio / GUI Tools

  1. Open LM Studio
  2. Load the .gguf file
  3. Select a llama.cpp backend
  4. Start inference

Quantization Notes

GGUF quantization provides:

  • Reduced memory usage
  • Faster load times
  • Compatibility with CPU-only systems

Recommended usage:

  • Q4 / Q5: Best balance of speed & quality
  • Q8: Higher quality, more memory

Intended Use Cases

  • Text generation
  • Review & sentiment analysis
  • QA automation pipelines
  • Agentic systems (RAG, MCP, LangGraph, CrewAI)
  • Offline / embedded AI applications

Attribution


License

This repository follows the MIT license, consistent with the original model.

Please review the base model license before commercial usage.


Disclaimer

This is an unofficial GGUF conversion.
IBM is not affiliated with or responsible for this release.

Downloads last month
1
GGUF
Model size
7B params
Architecture
granitehybrid
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for KavinduHansaka/granite-4.0-h-tiny-gguf

Quantized
(33)
this model