How to use from
Docker Model Runner
docker model run hf.co/clglavan/magos-k8s-0.6b:
Quick Links

magos-k8s-0.6b

magos-k8s-0.6b is a 0.6B-parameter reasoning model for Kubernetes diagnostics, derived from Qwen3-0.6B. It is trained in two full-weight stages: continued pre-training (CPT) on Kubernetes documentation, the v1.34 API reference (every resource Kind), the kubectl command reference, and Prometheus alert runbooks; followed by supervised fine-tuning (SFT) on event→YAML diagnostic pairs. Each response is a structured <think> reasoning trace followed by a concise answer — a kubectl/promtool command, a YAML patch, or a root cause plus fix.

Scope and design

The model targets a narrow task: mapping a Kubernetes symptom (a failed or Warning condition, a kubectl describe/events excerpt, a misconfigured manifest) to the responsible spec field and the corrective action. The reasoning trace is intentionally short and templated (implicated condition → spec field → verdict → fix / next command) rather than open-ended chain-of-thought — that is the form a 0.6B model reproduces reliably without drifting into invented detail.

Because every response terminates in a concrete next action, the model fits as the inner-loop reasoner of a planner→executor devops agent. It is full-weight fine-tuned (no LoRA/adapters), ships as bf16 safetensors plus GGUF quantizations, and runs locally at ~640 MB (Q8). Knowledge is frozen at the training-snapshot; treat it as a reasoning component, not a source of truth, and verify field/flag specifics against current docs or live kubectl explain.

What's new in v16 (current stable)

v16 is the largest and broadest corpus yet — ~108k <think> reasoning examples, all derived from the official Kubernetes sources and built so the model only ever phrases scenarios around verified facts (every YAML field is checked against the v1.34 OpenAPI schema; every flag against the kubectl reference). It combines two tracks:

  • Event-grounded diagnostic matched pairs (the v15 design): a BROKEN case (failed/Warning events ↔ the exact offending YAML field) and a HEALTHY case (clean events ↔ the same field set correctly), across ~80 failure subcategories (scheduling, image, crashloop, probes, volumes, networking, RBAC/PodSecurity, controllers, quota/limits, …).
  • Command-reference: correct kubectl invocations across ~45 subcommands and their flags.

Every answer is a short, structured <think> chain (events → correlate to field → verdict → fix, or goal → command) followed by a concise YAML patch or command — the form a 0.6B model reproduces reliably without drifting into invented detail.

v15 v16
Corpus ~16.6k diagnostic ~108k (diagnostic + command-reference)
Coverage ~80 diagnostic subcategories + ~45 kubectl subcommands/flags
Recipe 4 epochs · LR 2e-5 · batch 32 4 epochs · LR 2e-5 · batch 32

Strengths: diagnosing from pasted events/describe output, YAML generation/review, and structured next-step reasoning. It is full-weight fine-tuned (no LoRA), schema- grounded, and low-hallucination by construction.

To pin a specific version when loading:

AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b", revision="v16")
# or revision="v15" / "v8" / "v7" / "v6" / "v5" / "v3" / "v2" for previous versions

What it's good at

  • Diagnosing from events — paste kubectl get events / kubectl describe output and it correlates the failure to the responsible YAML field + fix.
  • YAML manifest generation and review — a top strength; correct apiVersion/field names across Pod, Deployment, Service, NetworkPolicy, PVC, HPA, Ingress, RBAC and many other Kinds (schema-validated training set).
  • kubectl command construction — broad subcommand/flag coverage from the reference (the v16 command-reference track).
  • Prometheus alert handling — meaning + diagnostic steps for the prometheus-operator runbook set.
  • Structured next-step reasoning — short <think> that ends in a concrete command or fix, suitable as an agent's inner-loop reasoner.

What it's not good at

  • Multi-step planning or complex tool chains — it's a 0.6B model.
  • Subtle/rare flags and multi-flag combinations — verify with kubectl --help.
  • General (non-Kubernetes) reasoning — this corpus is K8s-focused.
  • Knowledge of features released after the source docs were captured (mid-2026).

How to use

Important — sampling: v16 is a reasoning model. Run it greedy with repetition_penalty = 1.0. A repetition penalty > 1.0 penalizes the prompt words the <think> block needs to reference and collapses it to an empty <think></think>. (This differs from the terse v8, which used temp 0.05 / rep 1.15.)

llama.cpp / Ollama / LM Studio

File Size Quality
magos-k8s-0.6b-f16.gguf ~1.2 GB reference (full precision)
magos-k8s-0.6b-q8_0.gguf ~640 MB effectively identical to f16 — recommended
magos-k8s-0.6b-q4_k_m.gguf ~400 MB smallest; more field/flag mistakes — fine for casual use
from llama_cpp import Llama

llm = Llama(model_path="magos-k8s-0.6b-q8_0.gguf", n_ctx=4096, chat_format="chatml")
resp = llm.create_chat_completion(
    messages=[{"role": "user", "content":
        "kubectl describe pod shows: Warning FailedScheduling 0/3 nodes are available: 3 Insufficient memory. Why?"}],
    temperature=0.0,
    repeat_penalty=1.0,
    max_tokens=512,
)
print(resp["choices"][0]["message"]["content"])

Hugging Face transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tok   = AutoTokenizer.from_pretrained("clglavan/magos-k8s-0.6b")
model = AutoModelForCausalLM.from_pretrained("clglavan/magos-k8s-0.6b",
                                             dtype="bfloat16",
                                             device_map="auto")

messages = [{"role": "user", "content":
    "My pod is CrashLoopBackOff right after deploy. What's the likely cause and fix?"}]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512,
                     do_sample=False, repetition_penalty=1.0)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training

Base model Qwen/Qwen3-0.6B
Method Two stage: continued pre-training (CPT) → supervised fine-tuning (SFT). Both full-weight (no LoRA).
Stage 1 corpus 8.5k document chunks: kubernetes.io docs + blog (6.5k), Kubernetes API reference v1.34 (1.9k), Prometheus alert runbooks (106). Unchanged since v5.
Stage 1 LR 5e-6, cosine, 1 epoch (~6.5M tokens)
Stage 2 corpus (v16) ~108k synthetic Q&A pairs derived from the official documentation, all with a structured <think> reasoning block: event→YAML diagnostic matched BROKEN/HEALTHY pairs across 80 K8s failure subcategories plus a kubectl command-reference track (45 subcommands + flags). Every YAML field is validated against the v1.34 OpenAPI schema and every flag against the kubectl reference, so the teacher only phrases scenarios around verified facts.
Stage 2 LR 2e-5, cosine, 4 epochs, micro-batch 1 / grad-accum 32 (effective batch 32), seq len 2048, bf16

Files

  • model.safetensors — fine-tuned weights, HF format (bf16)
  • magos-k8s-0.6b-f16.gguf / -q8_0.gguf / -q4_k_m.gguf — GGUF quantizations
  • tokenizer.json, tokenizer_config.json, chat_template.jinja — Qwen3 tokenizer + ChatML template
  • config.json, generation_config.json — standard HF configs

Limitations and intended use

This is a small experimental model. Always verify any command, YAML, or behavioral claim against current Kubernetes documentation before running in production. Intended for learning, prototyping, and as a component in local devops agents — not as an authoritative source.

License

Apache 2.0. Inherits from the Qwen3-0.6B base model license. The training data is derived from the official Kubernetes documentation (CC-BY 4.0) and the prometheus-operator Prometheus runbooks (Apache 2.0).

Downloads last month
1,010
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for clglavan/magos-k8s-0.6b

Finetuned
Qwen/Qwen3-0.6B
Quantized
(330)
this model