---
library_name: transformers
tags:
- trl
- sft
license: mit
language:
- en
- tl
base_model:
- aisingapore/Gemma-SEA-LION-v3-9B-IT
pipeline_tag: translation
---

# Sea-Lion Taglish Translation Model

## Model Summary

This model is a fine-tuned version of **Sea Lion v3 9B IT**, adapted for English-to-Taglish machine translation. Taglish is a code-switched variety of English and Tagalog commonly used in the Philippines. The model was trained to generate fluent, naturalistic Taglish output from English input, with a focus on informal and social media domains.

The fine-tuning process involved lightweight QLoRA-based training on synthetic parallel examples from the Tweet Taglish dataset, following structured chat-style instruction tuning.

## Intended Use

This model is intended for research, experimentation, and development of machine translation systems that support bilingual or code-switched output. It is particularly suited for:

- Translating English to Taglish
- Studying code-switching behavior in LLMs
- Applications in multilingual NLP for Southeast Asian languages

**Not recommended** for high-stakes or formal use cases such as medical, legal, or governmental translation.

## Model Specs

- **Developed by:** Charlotte Puopolo
- **Model type:** Machine Translation trained on English-Taglish parallel Tweets
- **Language(s) (NLP):** Taglish (English-Tagalog code-switching)
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** Tweet Taglish dataset (Herrera et al., 2022)

### Model Sources

- **Repository:** https://github.com/puopolo/Taglish-Translation/tree/main/prompts
- **Paper:** [Coming soon]
- 
## How to Use

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-hf-username/sea-lion-taglish")
model = AutoModelForCausalLM.from_pretrained("your-hf-username/sea-lion-taglish")

prompt = "Translate to Tagalog-English code-switching: I need to go shopping later."
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(output[0], skip_special_tokens=True))

```

## How to Cite
If you use this model, please cite:

@misc{puopolo2025taglish,
  author = {Charlotte Puopolo},
  title = {Analyzing LLM Performance on Taglish Translation},
  year = {2025},
  note = {Hugging Face Model Repository},
  url = {https://huggingface.co/charlottepuopolo/sealion-3v-9b-it-taglish}
}