Sea-Lion Taglish Translation Model

Model Summary

This model is a fine-tuned version of Sea Lion v3 9B IT, adapted for English-to-Taglish machine translation. Taglish is a code-switched variety of English and Tagalog commonly used in the Philippines. The model was trained to generate fluent, naturalistic Taglish output from English input, with a focus on informal and social media domains.

The fine-tuning process involved lightweight QLoRA-based training on synthetic parallel examples from the Tweet Taglish dataset, following structured chat-style instruction tuning.

Intended Use

This model is intended for research, experimentation, and development of machine translation systems that support bilingual or code-switched output. It is particularly suited for:

Translating English to Taglish
Studying code-switching behavior in LLMs
Applications in multilingual NLP for Southeast Asian languages

Not recommended for high-stakes or formal use cases such as medical, legal, or governmental translation.

Model Specs

Developed by: Charlotte Puopolo
Model type: Machine Translation trained on English-Taglish parallel Tweets
Language(s) (NLP): Taglish (English-Tagalog code-switching)
License: [More Information Needed]
Finetuned from model [optional]: Tweet Taglish dataset (Herrera et al., 2022)

Model Sources

Repository: https://github.com/puopolo/Taglish-Translation/tree/main/prompts
Paper: [Coming soon]

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-hf-username/sea-lion-taglish")
model = AutoModelForCausalLM.from_pretrained("your-hf-username/sea-lion-taglish")

prompt = "Translate to Tagalog-English code-switching: I need to go shopping later."
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

How to Cite

If you use this model, please cite:

@misc{puopolo2025taglish, author = {Charlotte Puopolo}, title = {Analyzing LLM Performance on Taglish Translation}, year = {2025}, note = {Hugging Face Model Repository}, url = {https://huggingface.co/charlottepuopolo/sealion-3v-9b-it-taglish} }

Downloads last month: 2

Safetensors

Model size

9B params

Tensor type

F16

Model tree for charlottepuopolo/sealion-3v-9b-it-taglish

Base model

google/gemma-2-9b

Finetuned

aisingapore/Gemma-SEA-LION-v3-9B

Finetuned

aisingapore/Gemma-SEA-LION-v3-9B-IT

Finetuned

(3)

this model