AION-1 / benchmark /SMALL_MODEL_COMPARISON.md

Upload AION unified hybrid assistant with local eval results

ede2cba verified 15 days ago

955 Bytes

	# AION vs small models

	This repository includes `benchmark/benchmark_compare_small_models.py` to compare AION with small Hugging Face causal LMs on the same tiny local suite.

	## Local result included

	```json
	[
	{
	"model": "AION-1",
	"passed": 5,
	"total": 5,
	"accuracy": 1.0
	}
	]
	```

	## How to compare with other small models

	Install optional dependencies:

	```bash
	pip install torch transformers accelerate
	```

	Run:

	```bash
	python benchmark/benchmark_compare_small_models.py \
	--models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct
	```

	The script writes:

	```text
	results/small_model_comparison.json
	```

	## Important

	AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning.