---
pipeline_tag: text-generation
language: uig
license: gemma
tags:
  - trimmed
library_name: transformers
base_model: google/gemma-3-4b-it
base_model_relation: quantized
datasets:
  - lbourdois/fineweb-2-trimming
---

# gemma-3-4b-it-uig-16384
This model is a **14.63%** smaller version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) optimized for Uyghur language via vocabulary size reduction using the [trimming](https://huggingface.co/blog/lbourdois/introduction-to-trimming) method.  
This trimmed model should perform similarly to the original model with only 16,384 tokens and a much smaller memory footprint. However, it may not perform well for other languages as tokens not commonly used in the selected languages were removed from the vocabulary.

## Model Statistics
| Metric | Original | Trimmed | Reduction |
|--------|----------|---------|-----------|
| **Vocabulary size** | 262,144 tokens | 16,384 tokens | **93.75%** |
| **Model size** | 4,300,079,472 params | 3,670,770,032 params | **14.63%** |

![image](https://raw.githubusercontent.com/lbourdois/blog/refs/heads/master/assets/images/Trimming/gemma-3-4b-it-16384.png)

## Mining Dataset Statistics
- **Number of texts used for mining**: 24,729 texts  
- **Dataset**: [lbourdois/fineweb-2-trimming](https://huggingface.co/datasets/lbourdois/fineweb-2-trimming)

## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "alphaedge-ai/gemma-3-4b-it-uig-16384"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Your prompt in Uyghur."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=256)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
response = tokenizer.decode(output_ids, skip_special_tokens=True)
print(response)
```

## Citations

#### Gemma 3
```bibtex
@misc{gemmateam2025gemma3technicalreport,
      title={Gemma 3 Technical Report},
      author={Gemma Team},
      year={2025},
      eprint={2503.19786},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.19786},
}
```

#### Trimming blog post
```
@misc{hf_blogpost_trimming,
      title={Introduction to Trimming}, 
      author={Loïck BOURDOIS and Tom AARSEN and Bram VANROY and Christopher AKIKI and Woojun JUNG and Manuel ROMERO and Prithiv SAKTHI},
      year={2026},
      url={https://huggingface.co/blog/lbourdois/introduction-to-trimming}, 
}
```

### License
This model is derived from [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it).
Use of this model is governed by the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
By using this model, you agree to the Gemma Terms of Use. This model is not affiliated with or endorsed by Google.