Instructions to use tbilisi-ai-lab/kona2-small-3.8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tbilisi-ai-lab/kona2-small-3.8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tbilisi-ai-lab/kona2-small-3.8B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tbilisi-ai-lab/kona2-small-3.8B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("tbilisi-ai-lab/kona2-small-3.8B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use tbilisi-ai-lab/kona2-small-3.8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tbilisi-ai-lab/kona2-small-3.8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tbilisi-ai-lab/kona2-small-3.8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tbilisi-ai-lab/kona2-small-3.8B

SGLang

How to use tbilisi-ai-lab/kona2-small-3.8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tbilisi-ai-lab/kona2-small-3.8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tbilisi-ai-lab/kona2-small-3.8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tbilisi-ai-lab/kona2-small-3.8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tbilisi-ai-lab/kona2-small-3.8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tbilisi-ai-lab/kona2-small-3.8B with Docker Model Runner:
```
docker model run hf.co/tbilisi-ai-lab/kona2-small-3.8B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Kona2-small-3.8B

Kona2-small-3.8B is a compact 3.8-billion parameter Georgian language model built on Microsoft Phi-3.5-mini-instruct. It goes through the full training pipeline (pre-training + SFT + DPO) like the 12B models, but provides Georgian language capabilities with significantly lower compute requirements.

Model Summary

Property	Value
Parameters	3.8B
Architecture	Phi-3 (Transformer)
Context Length	8K tokens
Languages	Georgian (ka), English (en), other (limited)
Training	Full pipeline (Pre-training + SFT + DPO)
Vocabulary	Extended (~20K Georgian tokens)
Base Model	microsoft/Phi-3.5-mini-instruct

Intended Uses

Primary Use Cases

Edge deployment and mobile applications
Low-latency conversational AI
Georgian text generation on consumer hardware
Translation (especially strong)
Educational and research purposes
Rapid prototyping and development

Training

Training Pipeline

Same full pipeline as the 12B models, applied to the smaller Phi-3.5 base:

Vocabulary Expansion: Added ~20K Georgian tokens (1.9 tokens/word fertility)
Continue Pre-training: LoRA/DoRA on Georgian/English corpus
SFT (Supervised Fine-Tuning): Instruction tuning on Georgian instructions
DPO (Direct Preference Optimization): Preference alignment for better responses

Training Configuration

Base Model: microsoft/Phi-3.5-mini-instruct
Method: LoRA with DoRA enabled
Pre-training Context: 8K tokens
New Tokens: ~20K Georgian tokens
Precision: BF16
Infrastructure: NVIDIA H100 GPUs

Usage

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "tbilisi-ai-lab/kona2-small-3.8B",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True  # Required for Phi-3 architecture
)
tokenizer = AutoTokenizer.from_pretrained(
    "tbilisi-ai-lab/kona2-small-3.8B",
    trust_remote_code=True
)

messages = [
    {"role": "user", "content": "გამარჯობა! რა არის ხელოვნური ინტელექტი?"}
]

inputs = tokenizer.apply_chat_template(
    messages, 
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    inputs, 
    max_new_tokens=256, 
    temperature=0.7,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With Ollama (Local Deployment)

# If using GGUF quantized version
ollama run kona2-small

>>> გამარჯობა!
გამარჯობა! როგორ შემიძლია დაგეხმარო?

Comparison with 12B Models

Feature	Kona2-small-3.8B	Kona2-12B
Parameters	3.8B	12B
VRAM (FP16)	~8GB	~24GB
VRAM (4-bit)	~3GB	~8GB
Speed	Faster	Slower
Quality	Good	Better
Function Calling	Basic	Full
Reasoning	Limited	Strong

When to Use Kona2-small

Running on consumer GPUs (RTX 3060, RTX 4070, etc.)
Mobile or edge deployment
High-throughput, low-latency requirements
Simple Q&A and chat applications
Development and prototyping

When to Use Kona2-12B

Production applications requiring high quality
Complex reasoning tasks
Reliable function calling
When compute resources are available

Related Models

Model	Parameters	Use Case
kona2-12B	12B	Production (DPO-aligned)
kona2-12B-Instruct	12B	Production (SFT)
kona2-12B-Base	12B	Fine-tuning

Limitations

Training data cutoff: 2024

Technical Specifications

Precision: BF16/FP16 supported
Minimum VRAM: 8GB (FP16), 3GB (4-bit)
Custom Code: Required (trust_remote_code=True)

Citation

@misc{tbilisi2025kona2small,
  title        = {Kona2-small-3.8B: A Compact Georgian Language Model},
  author       = {Tbilisi AI Lab Team},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/tbilisi-ai-lab/kona2-small-3.8B}}
}