AION-1 / benchmark /SMALL_MODEL_COMPARISON.md
VoidWalkercero's picture
Upload AION unified hybrid assistant with local eval results
ede2cba verified
|
raw
history blame
955 Bytes
# AION vs small models
This repository includes `benchmark/benchmark_compare_small_models.py` to compare AION with small Hugging Face causal LMs on the same tiny local suite.
## Local result included
```json
[
{
"model": "AION-1",
"passed": 5,
"total": 5,
"accuracy": 1.0
}
]
```
## How to compare with other small models
Install optional dependencies:
```bash
pip install torch transformers accelerate
```
Run:
```bash
python benchmark/benchmark_compare_small_models.py \
--models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct
```
The script writes:
```text
results/small_model_comparison.json
```
## Important
AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning.