| # AION vs small models | |
| This repository includes `benchmark/benchmark_compare_small_models.py` to compare AION with small Hugging Face causal LMs on the same tiny local suite. | |
| ## Local result included | |
| ```json | |
| [ | |
| { | |
| "model": "AION-1", | |
| "passed": 5, | |
| "total": 5, | |
| "accuracy": 1.0 | |
| } | |
| ] | |
| ``` | |
| ## How to compare with other small models | |
| Install optional dependencies: | |
| ```bash | |
| pip install torch transformers accelerate | |
| ``` | |
| Run: | |
| ```bash | |
| python benchmark/benchmark_compare_small_models.py \ | |
| --models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct | |
| ``` | |
| The script writes: | |
| ```text | |
| results/small_model_comparison.json | |
| ``` | |
| ## Important | |
| AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning. | |