File size: 955 Bytes
ede2cba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# AION vs small models

This repository includes `benchmark/benchmark_compare_small_models.py` to compare AION with small Hugging Face causal LMs on the same tiny local suite.

## Local result included

```json
[
  {
    "model": "AION-1",
    "passed": 5,
    "total": 5,
    "accuracy": 1.0
  }
]
```

## How to compare with other small models

Install optional dependencies:

```bash
pip install torch transformers accelerate
```

Run:

```bash
python benchmark/benchmark_compare_small_models.py \
  --models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct
```

The script writes:

```text
results/small_model_comparison.json
```

## Important

AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning.