Instructions to use daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2 with PEFT:
Task type is invalid.
- Transformers
How to use daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2
- SGLang
How to use daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2 with Docker Model Runner:
docker model run hf.co/daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2
T5-Gemma-2-1B-Instruct-Chat-Indo-v2
Model ini merupakan LoRA adapter berukuran 1B yang difine-tune dari model dasar google/t5gemma-2-1b-1b menggunakan metode Supervised Fine-Tuning (SFT). Model ini difokuskan untuk pemahaman instruksi dan kemampuan dialog multi-turn secara natural dalam Bahasa Indonesia serta Bahasa Inggris.
Karakteristik dan Evaluasi Model
- Kapabilitas Percakapan Bilingual: Model ini dikonfigurasi untuk menangani instruksi dalam Bahasa Indonesia secara terstruktur, dengan fleksibilitas transisi ke Bahasa Inggris berdasarkan konteks atau permintaan pengguna.
- In-Task Learning & Cross-Attention: Menggunakan arsitektur Encoder-Decoder (Seq2Seq) untuk memetakan representasi encoder secara implisit sesuai kategori tugas (seperti ringkasan, penerjemahan, Q&A berbasis dokumen, dan parafrase) tanpa memerlukan prompt statis.
- Metrik Evaluasi: Mencapai skor BLEU sebesar 24.31% (dengan nilai loss evaluasi 2.699) pada evaluasi langkah ke-1000 menggunakan optimizer AdamW.
- Logit Masking: Menerapkan logit masking untuk memblokir logits bagi token yang tidak digunakan (unused tokens) dan token visual di encoder guna meminimalkan kesalahan pembentukan token selama generasi teks.
Cara Penggunaan
Berikut adalah contoh skrip Python untuk memuat model dasar beserta LoRA adapter ini menggunakan pustaka transformers dan peft:
import os
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel
base_model_name = "google/t5gemma-2-1b-1b"
adapter_id = "daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2"
# 1. Memuat Tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
# 2. Memuat Base Model
base_model = AutoModelForSeq2SeqLM.from_pretrained(
base_model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
device_map="auto" if torch.cuda.is_available() else None
)
# 3. Menerapkan Logit Masking
vocab_size = base_model.config.vocab_size
suppress_block1 = list(range(6, 105))
suppress_block2 = list(range(256002, 262144))
suppress_vision = [255999, 256000, 256001]
suppress_ids = [i for i in (suppress_block1 + suppress_block2 + suppress_vision) if i < vocab_size]
mask = torch.zeros(vocab_size, dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32)
mask[suppress_ids] = -10000.0
def forward_hook(module, inputs, outputs):
if hasattr(outputs, "logits"):
outputs.logits.add_(mask.to(outputs.logits.device))
elif isinstance(outputs, tuple):
outputs[0].add_(mask.to(outputs[0].device))
return outputs
base_model.register_forward_hook(forward_hook)
# 4. Memuat LoRA Adapter
model = PeftModel.from_pretrained(base_model, adapter_id, subfolder="checkpoint-1000")
model.eval()
# 5. Inferensi Percakapan
messages = [
{"role": "system", "content": "Kamu adalah asisten AI yang helpful, santai, dan ramah. Gunakan Bahasa Indonesia sebagai bahasa utama."},
{"role": "user", "content": "Tolong jelaskan secara singkat apa itu fotosintesis."}
]
prompt = ""
is_first_user = True
for msg in messages:
if msg["role"] == "system":
continue
if msg["role"] == "user":
prompt += "<start_of_turn>user\n"
if is_first_user:
prompt += messages[0]["content"] + "\n\n"
is_first_user = False
prompt += msg["content"] + "<end_of_turn>\n"
elif msg["role"] in ["assistant", "model"]:
prompt += "<start_of_turn>model\n" + msg["content"] + "<end_of_turn>\n"
prompt += "<start_of_turn>model\n"
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
repetition_penalty=1.0,
do_sample=True,
eos_token_id=[tokenizer.convert_tokens_to_ids("<end_of_turn>"), tokenizer.eos_token_id]
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response.strip())
Spesifikasi dan Hyperparameter Pelatihan
- Dataset: daruokta/t5gemma2-indonesia-chat-formatted (31,299 sampel)
- Konfigurasi LoRA:
- Rank (r):
128 - Alpha ($\alpha$):
256 - Target Modules:
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"] - Dropout:
0.2
- Rank (r):
- Hyperparameter Pelatihan:
- Epochs:
5(Dihentikan pada epoch4.08/ langkah1000dikarenakan kendala teknis pada kernel) - Optimizer:
paged_adamw_8bit(Paged AdamW 8-bit) - Learning Rate:
1e-5(Cosine decay scheduler dengan 100 langkah pemanasan/warmup) - Batch Size:
4per device (Gradient Accumulation Steps:32) - Precision: Mixed Precision
bfloat16 - Panjang Konteks (Source / Target):
2048 / 512 - Faktor Label Smoothing:
0.1 - NEFTune Noise Alpha:
5.0 - Weight Decay:
0.1
- Epochs:
Batasan & Lisensi
- Lisensi: Mengikuti lisensi dasar model Gemma dari Google.
- Keterbatasan: Performa terbaik difokuskan pada pemahaman Bahasa Indonesia dan Bahasa Inggris. Respons terhadap bahasa daerah atau bahasa asing lainnya dapat menunjukkan variasi akurasi.
Library Versions
- PEFT:
0.19.1 - PyTorch:
2.12.0+cu130 - Transformers:
5.11.0 - Bitsandbytes:
0.49.2 - Accelerate:
1.14.0 - Datasets:
5.0.0
Referensi & Publikasi Ilmiah
Makalah Rujukan Utama
T5Gemma 2: Seeing, Reading, and Understanding Longer
- Biao Zhang, et al. (Google DeepMind, 2025).
- arXiv: arXiv:2512.14856
- Relevansi: Menjelaskan detail arsitektural T5Gemma 2, termasuk teknik Tied Word Embeddings, Merged Attention, dan efisiensi model encoder-decoder dalam rentang konteks panjang.
Return of the Encoder: Maximizing Parameter Efficiency for SLMs
- Mohamed Elfeki, et al. (Microsoft, 2025).
- arXiv: arXiv:2501.16273
- Relevansi: Menyediakan basis pembuktian empiris bahwa model dengan arsitektur encoder-decoder lebih hemat parameter dibandingkan decoder-only untuk tugas-tugas berbasis instruksi dan dialog terstruktur.
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
- Han Ren, et al. (2025).
- arXiv: arXiv:2502.20186
- Relevansi: Teori penyelarasan bobot model multi-tugas yang dinonaktifkan secara selektif per lapisan (layer-aware) untuk menghindari tabrakan fitur (feature collision).
Editing Models with Task Arithmetic
- Gabriel Ilharco, et al. (2023).
- arXiv: arXiv:2212.04089
- Relevansi: Metode penyuntingan model (model surgery) melalui manipulasi aritmatika pada vektor parameter.
Gemma 3 Technical Report
- Google DeepMind (2025).
- arXiv: arXiv:2503.19786
- Relevansi: Laporan teknis model dasar Gemma 3 yang menjadi tumpuan transfer bobot inisialisasi T5Gemma 2.
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
- Fajri Koto, et al. (2024).
- arXiv: arXiv:2404.06138
- Relevansi: Kerangka kurasi data instruksi konversasional untuk lokalisasi performa model dalam rumpun bahasa Indonesia.
Sitasi BibTeX
@article{zhang2025t5gemma2,
title={T5Gemma 2: Seeing, Reading, and Understanding Longer},
author={Zhang, Biao and others},
journal={arXiv preprint arXiv:2512.14856},
year={2025}
}
@article{elfeki2025return,
title={Return of the Encoder: Maximizing Parameter Efficiency for SLMs},
author={Elfeki, Mohamed and others},
journal={arXiv preprint arXiv:2501.16273},
year={2025}
}
@article{ren2025layer,
title={Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge},
author={Ren, Han and others},
journal={arXiv preprint arXiv:2502.20186},
year={2025}
}
@article{ilharco2022editing,
title={Editing models with task arithmetic},
author={Ilharco, Gabriel and others},
journal={arXiv preprint arXiv:2212.04089},
year={2022}
}
@article{gemma3report,
title={Gemma 3 Technical Report},
author={DeepMind, Google},
journal={arXiv preprint arXiv:2503.19786},
year={2025}
}
@article{koto2024cendol,
title={Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages},
author={Koto, Fajri and others},
journal={arXiv preprint arXiv:2404.06138},
year={2024}
}
- Downloads last month
- -
Model tree for daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2
Base model
google/t5gemma-2-1b-1bDataset used to train daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2
Papers for daruokta/t5gemma-2-1b-1b-instruct-chat-indo-v2
T5Gemma 2: Seeing, Reading, and Understanding Longer
Gemma 3 Technical Report
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
Return of the Encoder: Maximizing Parameter Efficiency for SLMs
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
Evaluation results
- BLEU on T5-Gemma-2 Indonesia Chat Formattedself-reported24.315
- ROUGE-1 on T5-Gemma-2 Indonesia Chat Formattedself-reported66.163
- ROUGE-2 on T5-Gemma-2 Indonesia Chat Formattedself-reported49.309
- ROUGE-L on T5-Gemma-2 Indonesia Chat Formattedself-reported64.311
- ROUGE-Lsum on T5-Gemma-2 Indonesia Chat Formattedself-reported64.492