AION vs small models
This repository includes benchmark/benchmark_compare_small_models.py to compare AION with small Hugging Face causal LMs on the same tiny local suite.
Local result included
[
{
"model": "AION-1",
"passed": 5,
"total": 5,
"accuracy": 1.0
}
]
How to compare with other small models
Install optional dependencies:
pip install torch transformers accelerate
Run:
python benchmark/benchmark_compare_small_models.py \
--models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct
The script writes:
results/small_model_comparison.json
Important
AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning.