--- library_name: transformers tags: - trl - sft license: mit language: - en - tl base_model: - aisingapore/Gemma-SEA-LION-v3-9B-IT pipeline_tag: translation --- # Sea-Lion Taglish Translation Model ## Model Summary This model is a fine-tuned version of **Sea Lion v3 9B IT**, adapted for English-to-Taglish machine translation. Taglish is a code-switched variety of English and Tagalog commonly used in the Philippines. The model was trained to generate fluent, naturalistic Taglish output from English input, with a focus on informal and social media domains. The fine-tuning process involved lightweight QLoRA-based training on synthetic parallel examples from the Tweet Taglish dataset, following structured chat-style instruction tuning. ## Intended Use This model is intended for research, experimentation, and development of machine translation systems that support bilingual or code-switched output. It is particularly suited for: - Translating English to Taglish - Studying code-switching behavior in LLMs - Applications in multilingual NLP for Southeast Asian languages **Not recommended** for high-stakes or formal use cases such as medical, legal, or governmental translation. ## Model Specs - **Developed by:** Charlotte Puopolo - **Model type:** Machine Translation trained on English-Taglish parallel Tweets - **Language(s) (NLP):** Taglish (English-Tagalog code-switching) - **License:** [More Information Needed] - **Finetuned from model [optional]:** Tweet Taglish dataset (Herrera et al., 2022) ### Model Sources - **Repository:** https://github.com/puopolo/Taglish-Translation/tree/main/prompts - **Paper:** [Coming soon] - ## How to Use ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("your-hf-username/sea-lion-taglish") model = AutoModelForCausalLM.from_pretrained("your-hf-username/sea-lion-taglish") prompt = "Translate to Tagalog-English code-switching: I need to go shopping later." inputs = tokenizer(prompt, return_tensors="pt") output = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ## How to Cite If you use this model, please cite: @misc{puopolo2025taglish, author = {Charlotte Puopolo}, title = {Analyzing LLM Performance on Taglish Translation}, year = {2025}, note = {Hugging Face Model Repository}, url = {https://huggingface.co/charlottepuopolo/sealion-3v-9b-it-taglish} }