Instructions to use ErenAta00/Umay-Aya-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ErenAta00/Umay-Aya-8B with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("CohereLabs/aya-expanse-8b")
model = PeftModel.from_pretrained(base_model, "ErenAta00/Umay-Aya-8B")

Transformers

How to use ErenAta00/Umay-Aya-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ErenAta00/Umay-Aya-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ErenAta00/Umay-Aya-8B", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ErenAta00/Umay-Aya-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ErenAta00/Umay-Aya-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ErenAta00/Umay-Aya-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ErenAta00/Umay-Aya-8B

SGLang

How to use ErenAta00/Umay-Aya-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ErenAta00/Umay-Aya-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ErenAta00/Umay-Aya-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ErenAta00/Umay-Aya-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ErenAta00/Umay-Aya-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ErenAta00/Umay-Aya-8B with Docker Model Runner:
```
docker model run hf.co/ErenAta00/Umay-Aya-8B
```

Umay-Aya-8B

A concise, accurate, and reliable Turkish assistant.

License. This model is trained on top of CohereLabs/aya-expanse-8b and inherits the base model's license: CC-BY-NC 4.0 (non-commercial) and Cohere's Acceptable Use Policy. Derivative use is subject to these terms.

Overview

Umay is a language model built on Cohere Labs' multilingual aya-expanse-8b, trained for Turkish instruction-following and technical tasks. Its design priority is to produce direct, actionable, and uncluttered Turkish output rather than long, diffuse text.

The name comes from Umay, the goddess of fertility and protection in Turkic mythology — reflecting the model's goal of producing reliable and accurate responses.

The model was trained with LoRA on a carefully curated Turkish supervised fine-tuning (SFT) dataset of 63,000 rows. The training data was repaired at the newline and character-encoding level to preserve the structural integrity of lists and code blocks.

Intended Use

Technical Q&A and explanation — direct, actionable answers in software architecture, cybersecurity, algorithms, cloud, and infrastructure.
Code generation and debugging — working code examples with short, accurate explanations.
Instruction following — output in specific formats (bullets, tables, step lists) and under specific constraints.
Professional and technical writing — concise Turkish text in a professional tone.
Cost-sensitive applications — lower output length reduces latency and per-request token cost.

Differentiation from the Base and Similar Models

Umay's distinguishing characteristic is that it produces answers of comparable quality in noticeably shorter form, and delivers more stable output in technical domains. The results below come from an internal evaluation conducted under identical decoding settings to the base model aya-expanse-8b.

Efficiency

Median word count per answer dropped from 173 to 82 on commonly-correct questions. The model completes the same work as the base model in roughly half the length, which translates to lower latency and lower output-token cost.

Metric	aya-expanse-8b	Umay	Difference
Median answer length (words)	173	82	−53%
Median generation time	reference	—	−24%

Stability (Technical Domains)

Degenerate generation (repetition / collapse) that can appear in long technical answers was eliminated in several critical categories. The values below are the degeneration rate of answers on a 300-item internal test set.

Category	aya-expanse-8b	Umay	Difference
Cybersecurity	10%	0%	−10 pts
Software Architecture	3%	0%	−3 pts
Argumentation & Ethics	3%	0%	−3 pts
Automation & Testing	17%	10%	−7 pts
Algorithms	7%	3%	−4 pts

These results show that Umay delivers the most stable output among the compared models — including the base — particularly in cybersecurity and software architecture.

Content-accuracy and safety evaluations confirm that the model remains at the same level as the base model; the efficiency and stability gains were achieved without any reduction in accuracy or safety.

These measurements are based on a 300-item internal Turkish in-domain test set run under identical conditions to the base model, and are not results from an independent public leaderboard.

Training Details

Field	Value
Base model	`CohereLabs/aya-expanse-8b` (8B parameters)
Method	LoRA (PEFT) — supervised fine-tuning (SFT)
LoRA configuration	r = 64, alpha = 128, dropout = 0.05; target modules: q, k, v, o, gate, up, down_proj
Data	~63,000 Turkish SFT rows (chat format; newline and character-encoding repaired)
Epochs	2
Validation loss	0.984 → 0.953
Hardware / precision	Single GPU, bf16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "CohereLabs/aya-expanse-8b"        # gated — requires Hugging Face access approval
adapter_id = "ErenAta00/Umay-Aya-8B"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
    base_id, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, adapter_id)

messages = [
    {"role": "user", "content": "Bir REST API'de hız sınırlama (rate limiting) nasıl uygulanır? Kısaca açıkla."}
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

output = model.generate(inputs, max_new_tokens=400, do_sample=True, temperature=0.3)
print(tokenizer.decode(output[0][inputs.shape[-1]:], skip_special_tokens=True))

Limitations and Responsible Use

License: Commercial use is not permitted under CC-BY-NC 4.0; Cohere's Acceptable Use Policy applies.
Evaluation scope: Results are based on an internal in-domain test set and have not been validated against independent public leaderboards.
Answer length: The model tends toward concise answers; for cases requiring detailed explanation, this should be stated explicitly in the prompt.
Model size: This is an 8B-scale model. In domains requiring deep expertise (medical diagnosis, binding legal interpretation), it should be positioned as an assistant rather than a source of truth, and its outputs should be subject to human review.

License

CC-BY-NC 4.0 and Cohere's Acceptable Use Policy (inherited from the base model).

Citation

@misc{umay2026,
  title  = {Umay-Aya-8B: Efficiency-Focused Supervised Fine-Tuning for Turkish},
  author = {Ata, Eren},
  year   = {2026},
  note   = {LoRA SFT on CohereLabs/aya-expanse-8b, CC-BY-NC 4.0}
}

@misc{dang2024ayaexpanse,
  title  = {Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier},
  author = {Dang, John and others},
  year   = {2024},
  eprint = {2412.04261}
}

Downloads last month: 5

Model tree for ErenAta00/Umay-Aya-8B

Base model

CohereLabs/aya-expanse-8b

Adapter

(28)

this model

Paper for ErenAta00/Umay-Aya-8B

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

Paper • 2412.04261 • Published Dec 5, 2024 • 9