Kona2-small-3.8B

Kona2-small-3.8B is a compact 3.8-billion parameter Georgian language model built on Microsoft Phi-3.5-mini-instruct. It goes through the full training pipeline (pre-training + SFT + DPO) like the 12B models, but provides Georgian language capabilities with significantly lower compute requirements.

Model Summary

Property Value
Parameters 3.8B
Architecture Phi-3 (Transformer)
Context Length 8K tokens
Languages Georgian (ka), English (en), other (limited)
Training Full pipeline (Pre-training + SFT + DPO)
Vocabulary Extended (~20K Georgian tokens)
Base Model microsoft/Phi-3.5-mini-instruct

Intended Uses

Primary Use Cases

  • Edge deployment and mobile applications
  • Low-latency conversational AI
  • Georgian text generation on consumer hardware
  • Translation (especially strong)
  • Educational and research purposes
  • Rapid prototyping and development

Training

Training Pipeline

Same full pipeline as the 12B models, applied to the smaller Phi-3.5 base:

  1. Vocabulary Expansion: Added ~20K Georgian tokens (1.9 tokens/word fertility)
  2. Continue Pre-training: LoRA/DoRA on Georgian/English corpus
  3. SFT (Supervised Fine-Tuning): Instruction tuning on Georgian instructions
  4. DPO (Direct Preference Optimization): Preference alignment for better responses

Training Configuration

  • Base Model: microsoft/Phi-3.5-mini-instruct
  • Method: LoRA with DoRA enabled
  • Pre-training Context: 8K tokens
  • New Tokens: ~20K Georgian tokens
  • Precision: BF16
  • Infrastructure: NVIDIA H100 GPUs

Usage

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "tbilisi-ai-lab/kona2-small-3.8B",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True  # Required for Phi-3 architecture
)
tokenizer = AutoTokenizer.from_pretrained(
    "tbilisi-ai-lab/kona2-small-3.8B",
    trust_remote_code=True
)

messages = [
    {"role": "user", "content": "გამარჯობა! რა არის ხელოვნური ინტელექტი?"}
]

inputs = tokenizer.apply_chat_template(
    messages, 
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs, 
    max_new_tokens=256, 
    temperature=0.7,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Ollama (Local Deployment)

# If using GGUF quantized version
ollama run kona2-small
>>> გამარჯობა!
გამარჯობა! როგორ შემიძლია დაგეხმარო?

Comparison with 12B Models

Feature Kona2-small-3.8B Kona2-12B
Parameters 3.8B 12B
VRAM (FP16) ~8GB ~24GB
VRAM (4-bit) ~3GB ~8GB
Speed Faster Slower
Quality Good Better
Function Calling Basic Full
Reasoning Limited Strong

When to Use Kona2-small

  • Running on consumer GPUs (RTX 3060, RTX 4070, etc.)
  • Mobile or edge deployment
  • High-throughput, low-latency requirements
  • Simple Q&A and chat applications
  • Development and prototyping

When to Use Kona2-12B

  • Production applications requiring high quality
  • Complex reasoning tasks
  • Reliable function calling
  • When compute resources are available

Related Models

Model Parameters Use Case
kona2-12B 12B Production (DPO-aligned)
kona2-12B-Instruct 12B Production (SFT)
kona2-12B-Base 12B Fine-tuning

Limitations

  • Training data cutoff: 2024

Technical Specifications

  • Precision: BF16/FP16 supported
  • Minimum VRAM: 8GB (FP16), 3GB (4-bit)
  • Custom Code: Required (trust_remote_code=True)

Citation

@misc{tbilisi2025kona2small,
  title        = {Kona2-small-3.8B: A Compact Georgian Language Model},
  author       = {Tbilisi AI Lab Team},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/tbilisi-ai-lab/kona2-small-3.8B}}
}

License

This model is released under the Apache 2.0 License.

Contact

Downloads last month
89
Safetensors
Model size
4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tbilisi-ai-lab/kona2-small-3.8B

Finetuned
(324)
this model
Finetunes
1 model
Quantizations
3 models