Instructions to use ErenAta00/Umay-Aya-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ErenAta00/Umay-Aya-8B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("CohereLabs/aya-expanse-8b") model = PeftModel.from_pretrained(base_model, "ErenAta00/Umay-Aya-8B") - Transformers
How to use ErenAta00/Umay-Aya-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ErenAta00/Umay-Aya-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ErenAta00/Umay-Aya-8B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ErenAta00/Umay-Aya-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ErenAta00/Umay-Aya-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ErenAta00/Umay-Aya-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ErenAta00/Umay-Aya-8B
- SGLang
How to use ErenAta00/Umay-Aya-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ErenAta00/Umay-Aya-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ErenAta00/Umay-Aya-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ErenAta00/Umay-Aya-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ErenAta00/Umay-Aya-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ErenAta00/Umay-Aya-8B with Docker Model Runner:
docker model run hf.co/ErenAta00/Umay-Aya-8B
Umay-Aya-8B
A concise, accurate, and reliable Turkish assistant.
License. This model is trained on top of
CohereLabs/aya-expanse-8band inherits the base model's license: CC-BY-NC 4.0 (non-commercial) and Cohere's Acceptable Use Policy. Derivative use is subject to these terms.
Overview
Umay is a language model built on Cohere Labs' multilingual aya-expanse-8b, trained for Turkish instruction-following and technical tasks. Its design priority is to produce direct, actionable, and uncluttered Turkish output rather than long, diffuse text.
The name comes from Umay, the goddess of fertility and protection in Turkic mythology — reflecting the model's goal of producing reliable and accurate responses.
The model was trained with LoRA on a carefully curated Turkish supervised fine-tuning (SFT) dataset of 63,000 rows. The training data was repaired at the newline and character-encoding level to preserve the structural integrity of lists and code blocks.
Intended Use
- Technical Q&A and explanation — direct, actionable answers in software architecture, cybersecurity, algorithms, cloud, and infrastructure.
- Code generation and debugging — working code examples with short, accurate explanations.
- Instruction following — output in specific formats (bullets, tables, step lists) and under specific constraints.
- Professional and technical writing — concise Turkish text in a professional tone.
- Cost-sensitive applications — lower output length reduces latency and per-request token cost.
Differentiation from the Base and Similar Models
Umay's distinguishing characteristic is that it produces answers of comparable quality in noticeably shorter form, and delivers more stable output in technical domains. The results below come from an internal evaluation conducted under identical decoding settings to the base model aya-expanse-8b.
Efficiency
Median word count per answer dropped from 173 to 82 on commonly-correct questions. The model completes the same work as the base model in roughly half the length, which translates to lower latency and lower output-token cost.
| Metric | aya-expanse-8b | Umay | Difference |
|---|---|---|---|
| Median answer length (words) | 173 | 82 | −53% |
| Median generation time | reference | — | −24% |
Stability (Technical Domains)
Degenerate generation (repetition / collapse) that can appear in long technical answers was eliminated in several critical categories. The values below are the degeneration rate of answers on a 300-item internal test set.
| Category | aya-expanse-8b | Umay | Difference |
|---|---|---|---|
| Cybersecurity | 10% | 0% | −10 pts |
| Software Architecture | 3% | 0% | −3 pts |
| Argumentation & Ethics | 3% | 0% | −3 pts |
| Automation & Testing | 17% | 10% | −7 pts |
| Algorithms | 7% | 3% | −4 pts |
These results show that Umay delivers the most stable output among the compared models — including the base — particularly in cybersecurity and software architecture.
Content-accuracy and safety evaluations confirm that the model remains at the same level as the base model; the efficiency and stability gains were achieved without any reduction in accuracy or safety.
These measurements are based on a 300-item internal Turkish in-domain test set run under identical conditions to the base model, and are not results from an independent public leaderboard.
Training Details
| Field | Value |
|---|---|
| Base model | CohereLabs/aya-expanse-8b (8B parameters) |
| Method | LoRA (PEFT) — supervised fine-tuning (SFT) |
| LoRA configuration | r = 64, alpha = 128, dropout = 0.05; target modules: q, k, v, o, gate, up, down_proj |
| Data | ~63,000 Turkish SFT rows (chat format; newline and character-encoding repaired) |
| Epochs | 2 |
| Validation loss | 0.984 → 0.953 |
| Hardware / precision | Single GPU, bf16 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_id = "CohereLabs/aya-expanse-8b" # gated — requires Hugging Face access approval
adapter_id = "ErenAta00/Umay-Aya-8B"
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
base_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, adapter_id)
messages = [
{"role": "user", "content": "Bir REST API'de hız sınırlama (rate limiting) nasıl uygulanır? Kısaca açıkla."}
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
output = model.generate(inputs, max_new_tokens=400, do_sample=True, temperature=0.3)
print(tokenizer.decode(output[0][inputs.shape[-1]:], skip_special_tokens=True))
Limitations and Responsible Use
- License: Commercial use is not permitted under CC-BY-NC 4.0; Cohere's Acceptable Use Policy applies.
- Evaluation scope: Results are based on an internal in-domain test set and have not been validated against independent public leaderboards.
- Answer length: The model tends toward concise answers; for cases requiring detailed explanation, this should be stated explicitly in the prompt.
- Model size: This is an 8B-scale model. In domains requiring deep expertise (medical diagnosis, binding legal interpretation), it should be positioned as an assistant rather than a source of truth, and its outputs should be subject to human review.
License
CC-BY-NC 4.0 and Cohere's Acceptable Use Policy (inherited from the base model).
Citation
@misc{umay2026,
title = {Umay-Aya-8B: Efficiency-Focused Supervised Fine-Tuning for Turkish},
author = {Ata, Eren},
year = {2026},
note = {LoRA SFT on CohereLabs/aya-expanse-8b, CC-BY-NC 4.0}
}
@misc{dang2024ayaexpanse,
title = {Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier},
author = {Dang, John and others},
year = {2024},
eprint = {2412.04261}
}
- Downloads last month
- 5
Model tree for ErenAta00/Umay-Aya-8B
Base model
CohereLabs/aya-expanse-8b