Sea-Lion Taglish Translation Model

Model Summary

This model is a fine-tuned version of Sea Lion v3 9B IT, adapted for English-to-Taglish machine translation. Taglish is a code-switched variety of English and Tagalog commonly used in the Philippines. The model was trained to generate fluent, naturalistic Taglish output from English input, with a focus on informal and social media domains.

The fine-tuning process involved lightweight QLoRA-based training on synthetic parallel examples from the Tweet Taglish dataset, following structured chat-style instruction tuning.

Intended Use

This model is intended for research, experimentation, and development of machine translation systems that support bilingual or code-switched output. It is particularly suited for:

  • Translating English to Taglish
  • Studying code-switching behavior in LLMs
  • Applications in multilingual NLP for Southeast Asian languages

Not recommended for high-stakes or formal use cases such as medical, legal, or governmental translation.

Model Specs

  • Developed by: Charlotte Puopolo
  • Model type: Machine Translation trained on English-Taglish parallel Tweets
  • Language(s) (NLP): Taglish (English-Tagalog code-switching)
  • License: [More Information Needed]
  • Finetuned from model [optional]: Tweet Taglish dataset (Herrera et al., 2022)

Model Sources

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-hf-username/sea-lion-taglish")
model = AutoModelForCausalLM.from_pretrained("your-hf-username/sea-lion-taglish")

prompt = "Translate to Tagalog-English code-switching: I need to go shopping later."
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

How to Cite

If you use this model, please cite:

@misc{puopolo2025taglish, author = {Charlotte Puopolo}, title = {Analyzing LLM Performance on Taglish Translation}, year = {2025}, note = {Hugging Face Model Repository}, url = {https://huggingface.co/charlottepuopolo/sealion-3v-9b-it-taglish} }

Downloads last month
2
Safetensors
Model size
9B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for charlottepuopolo/sealion-3v-9b-it-taglish

Finetuned
(3)
this model