AION vs small models

This repository includes benchmark/benchmark_compare_small_models.py to compare AION with small Hugging Face causal LMs on the same tiny local suite.

Local result included

[
  {
    "model": "AION-1",
    "passed": 5,
    "total": 5,
    "accuracy": 1.0
  }
]

How to compare with other small models

Install optional dependencies:

pip install torch transformers accelerate

Run:

python benchmark/benchmark_compare_small_models.py \
  --models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct

The script writes:

results/small_model_comparison.json

Important

AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning.