Instructions to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="nectec/pathumma-thaillm-8b-think-3.0.0-GGUF",
	filename="pathumma-thaillm-8b-think-3-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Use Docker

docker model run hf.co/nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with Ollama:
```
ollama run hf.co/nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M
```

Unsloth Studio

How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nectec/pathumma-thaillm-8b-think-3.0.0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nectec/pathumma-thaillm-8b-think-3.0.0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nectec/pathumma-thaillm-8b-think-3.0.0-GGUF to start chatting

How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with Docker Model Runner:
```
docker model run hf.co/nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M
```

Lemonade

How to use nectec/pathumma-thaillm-8b-think-3.0.0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull nectec/pathumma-thaillm-8b-think-3.0.0-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.pathumma-thaillm-8b-think-3.0.0-GGUF-Q4_K_M

List all available models

lemonade list

Pathumma-ThaiLLM-Think-3.0.0

Post-trained Thai Large Language Model built upon the foundation model from the Thai national initiative ThaiLLM.

This release applies multi-stage Supervised Fine-Tuning (SFT) to enhance:

Instruction following
Structured tool / function calling
Mathematical and coding competence
Multi-step analytical capability
Thai–English bilingual robustness

Training Strategy

Post-training is organized into two stages:

Stage 1: Instruction & Tool-Calling Alignment
Stage 2: Reasoning Specialization

For selected corpora, only curated subsets were used to maintain domain balance.

Stage 1: Instruction & Tool-Calling Alignment

Focus areas:

Instruction compliance
Structured tool-call formatting
General Thai task robustness
STEM-oriented instruction alignment

Datasets

Dataset	Training Subset Size	Full Dataset Size	Domain	License
beyoru/ToolCall_synthetic_qwen3	60,000	60,000	Tool	Apache-2.0
airesearch/WangchanX-FLAN-v6	2,000,000	13,619,450	General	Mixed
nvidia/OpenMathInstruct-2	1,000,000	14,000,000	STEM	CC-BY-4.0
jdaddyalbs/playwright-mcp-toolcalling	1,750	1,750	Tool	MIT
BitAgent/tool_calling	551,000	551,000	Tool	MIT

Stage 2: Reasoning Specialization

Focus areas:

Multi-step mathematical analysis
Code understanding and synthesis
Structured analytical responses
Tool-calling with explicit reasoning traces
Thai reasoning distillation

Datasets

Dataset	Training Subset Size	Full Dataset Size	Domain	License
nvidia/OpenMathReasoning	500,000	4,920,000	STEM	CC-BY-4.0
nvidia/OpenCodeReasoning	585,000	585,000	Coding	CC-BY-4.0
natolambert/GeneralThought-430K-filtered	337,579	337,579	General	MIT
Jofthomas/hermes-function-calling-thinking-V1	3,570	3,570	Tool	MIT
open-thoughts/OpenThoughts3-1.2M	1,200,000	1,200,000	STEM	Apache-2.0
scb10x/typhoon-r1-sft-data	23,851	23,851	General	Custom
iapp/Thai-R1-Distill-SFT	10,000	10,000	General	Custom
nvidia/Nemotron-Post-Training-Dataset-v1	310,000	310,000	Tool	CC-BY-4.0

Note: For selected datasets, curated subsets were employed to ensure balanced domain representation.

Methodology

Base model: ThaiLLM foundation model
Training objective: Supervised Fine-Tuning (SFT)
Two-stage curriculum design
Domain-balanced optimization
Tool-call schema alignment
Thai reasoning distillation

Compute Infrastructure

Training was conducted on the LANTA high-performance computing cluster, utilizing 16 nodes (64×A100 40GB GPUs) for distributed large-scale post-training.

Capabilities

Thai instruction compliance
Structured JSON tool invocation
Mathematical problem solving
Code generation and analysis
Multi-step analytical tasks
Thai–English bilingual support

Limitations

May hallucinate if tool schema is incomplete
Performance on long analytical chains may degrade without retrieval
Domain coverage depends on included corpora

Quickstart

The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers. With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

The following contains a code snippet illustrating how to use the model generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "nectec/pathumma-thaillm-8b-think-3.0.0"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
# prepare the model input
prompt = "ทำไมวงกลมถึงมี 360 องศา"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)

For deployment, you can use vllm>=0.8.5 to create an OpenAI-compatible API endpoint:

vllm serve nectec/pathumma-thaillm-8b-think-3.0.0 \
  --enforce-eager \
  --no-enable-chunked-prefill \
  --tool-call-parser hermes

For local use, applications such as Ollama, LMStudio, and llama.cpp have also supported.

About the Project

Pathumma-ThaiLLM-Think-3.0.0 is part of ongoing research toward sovereign Thai large language models optimized for analytical and tool-augmented intelligence.