---
license: apache-2.0
library_name: transformers
tags:
- language
- granite-4.0
---
# Granite-4.0-H-Tiny
**Model Summary:**
Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from *Granite-4.0-H-Tiny-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved *instruction following (IF)* and *tool-calling* capabilities, making them more effective in enterprise applications.
- **Developers:** Granite Team, IBM
- **HF Collection:** [Granite 4.0 Language Models HF Collection](https://huggingface.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c)
- **GitHub Repository:** [ibm-granite/granite-4.0-language-models](https://github.com/ibm-granite/granite-4.0-language-models)
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
- **Release Date**: October 2nd, 2025
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
**Supported Languages:**
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 4.0 models for languages beyond these languages.
**Intended use:**
The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications.
*Capabilities*
* Summarization
* Text classification
* Text extraction
* Question-answering
* Retrieval Augmented Generation (RAG)
* Code related tasks
* Function-calling tasks
* Multilingual dialog use cases
**Generation:**
This is a simple example of how to use Granite-4.0-H-Tiny model.
Install the following libraries:
```shell
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
```
Then, copy the snippet from the section that is relevant for your use case.
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda"
model_path = "ibm-granite/granite-4.0-h-tiny"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])
```
Expected output:
```shell
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>Almaden Research Center, San Jose, California<|end_of_text|>
```
**Tool-calling:**
Granite-4.0-H-Tiny comes with enhanced tool calling capabilities, enabling seamless integration with external functions and APIs. To define a list of tools please follow OpenAI's function [definition schema](https://platform.openai.com/docs/guides/function-calling?api-mode=responses#defining-functions).
This is an example of how to use Granite-4.0-H-Tiny model tool-calling ability:
```python
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a specified city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "Name of the city"
}
},
"required": ["city"]
}
}
}
]
# change input text as desired
chat = [
{ "role": "user", "content": "What's the weather like in Boston right now?" },
]
chat = tokenizer.apply_chat_template(chat, \
tokenize=False, \
tools=tools, \
add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output[0])
```
Expected output:
```shell
<|start_of_role|>system<|end_of_role|>You are a helpful assistant with access to the following tools. You may call one or more tools to assist with the user query.
You are provided with function signatures within
| Model | Micro Dense | H Micro Dense | H Tiny MoE | H Small MoE |
|---|---|---|---|---|
| Embedding size | 2560 | 2048 | 1536 | 4096 |
| Number of layers | 40 attention | 4 attention / 36 Mamba2 | 4 attention / 36 Mamba2 | 4 attention / 36 Mamba2 |
| Attention head size | 64 | 64 | 128 | 128 |
| Number of attention heads | 40 | 32 | 12 | 32 |
| Number of KV heads | 8 | 8 | 4 | 8 |
| Mamba2 state size | - | 128 | 128 | 128 |
| Number of Mamba2 heads | - | 64 | 48 | 128 |
| MLP / Shared expert hidden size | 8192 | 8192 | 1024 | 1536 |
| Num. Experts | - | - | 64 | 72 |
| Num. active Experts | - | - | 6 | 10 |
| Expert hidden size | - | - | 512 | 768 |
| MLP activation | SwiGLU | SwiGLU | SwiGLU | SwiGLU |
| Sequence length | 128K | 128K | 128K | 128K |
| Position embedding | RoPE | NoPE | NoPE | NoPE |
| # Parameters | 3B | 3B | 7B | 32B |
| # Active parameters | 3B | 3B | 1B | 9B |