Upload AION unified hybrid assistant with local eval results
Browse files- README.md +31 -0
- assets/aion_architecture.svg +32 -0
- assets/aion_benchmark.svg +19 -0
- assets/aion_logo.svg +46 -0
- benchmark/SMALL_MODEL_COMPARISON.md +41 -0
- benchmark/benchmark_compare_small_models.py +66 -0
- download/AION-1.zip +3 -0
- results/small_model_comparison.json +41 -0
README.md
CHANGED
|
@@ -18,6 +18,11 @@ pipeline_tag: text-generation
|
|
| 18 |
|
| 19 |
# AION
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
AION is a tiny hybrid local assistant built in a constrained CPU environment. It unifies several learned and symbolic components into one entrypoint:
|
| 22 |
|
| 23 |
```python
|
|
@@ -65,6 +70,27 @@ print(generate("hola"))
|
|
| 65 |
- internet,
|
| 66 |
- machine learning.
|
| 67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
## Architecture
|
| 69 |
|
| 70 |
AION is not a transformer LLM. It is a merged hybrid model:
|
|
@@ -95,6 +121,9 @@ print(generate("what can you do"))
|
|
| 95 |
|
| 96 |
## Evaluation
|
| 97 |
|
|
|
|
|
|
|
|
|
|
| 98 |
Local evaluation results are in:
|
| 99 |
|
| 100 |
```text
|
|
@@ -114,6 +143,8 @@ Summary:
|
|
| 114 |
|
| 115 |
Important: these are **not official Hugging Face leaderboard results**. AION is not a standard `transformers` model and cannot be directly submitted to most official HF benchmark leaderboards without a custom evaluation adapter. The GSM8K sample result is included honestly and shows the current limitation on multi-step word problems.
|
| 116 |
|
|
|
|
|
|
|
| 117 |
## Limitations
|
| 118 |
|
| 119 |
- Not a large language model.
|
|
|
|
| 18 |
|
| 19 |
# AION
|
| 20 |
|
| 21 |
+

|
| 22 |
+
|
| 23 |
+

|
| 24 |
+
|
| 25 |
+
|
| 26 |
AION is a tiny hybrid local assistant built in a constrained CPU environment. It unifies several learned and symbolic components into one entrypoint:
|
| 27 |
|
| 28 |
```python
|
|
|
|
| 70 |
- internet,
|
| 71 |
- machine learning.
|
| 72 |
|
| 73 |
+
## Download
|
| 74 |
+
|
| 75 |
+
You can download the complete ready-to-run package from the repository files:
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
git lfs install
|
| 79 |
+
git clone https://huggingface.co/VoidWalkercero/AION-1
|
| 80 |
+
cd AION-1
|
| 81 |
+
python aion.py "hola"
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
Or from Python:
|
| 85 |
+
|
| 86 |
+
```python
|
| 87 |
+
from huggingface_hub import snapshot_download
|
| 88 |
+
path = snapshot_download("VoidWalkercero/AION-1")
|
| 89 |
+
print(path)
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
A zipped copy is also included under `download/AION-1.zip`.
|
| 93 |
+
|
| 94 |
## Architecture
|
| 95 |
|
| 96 |
AION is not a transformer LLM. It is a merged hybrid model:
|
|
|
|
| 121 |
|
| 122 |
## Evaluation
|
| 123 |
|
| 124 |
+

|
| 125 |
+
|
| 126 |
+
|
| 127 |
Local evaluation results are in:
|
| 128 |
|
| 129 |
```text
|
|
|
|
| 143 |
|
| 144 |
Important: these are **not official Hugging Face leaderboard results**. AION is not a standard `transformers` model and cannot be directly submitted to most official HF benchmark leaderboards without a custom evaluation adapter. The GSM8K sample result is included honestly and shows the current limitation on multi-step word problems.
|
| 145 |
|
| 146 |
+
For optional comparison with small HF models, see `benchmark/benchmark_compare_small_models.py` and `benchmark/SMALL_MODEL_COMPARISON.md`.
|
| 147 |
+
|
| 148 |
## Limitations
|
| 149 |
|
| 150 |
- Not a large language model.
|
assets/aion_architecture.svg
ADDED
|
|
assets/aion_benchmark.svg
ADDED
|
|
assets/aion_logo.svg
ADDED
|
|
benchmark/SMALL_MODEL_COMPARISON.md
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AION vs small models
|
| 2 |
+
|
| 3 |
+
This repository includes `benchmark/benchmark_compare_small_models.py` to compare AION with small Hugging Face causal LMs on the same tiny local suite.
|
| 4 |
+
|
| 5 |
+
## Local result included
|
| 6 |
+
|
| 7 |
+
```json
|
| 8 |
+
[
|
| 9 |
+
{
|
| 10 |
+
"model": "AION-1",
|
| 11 |
+
"passed": 5,
|
| 12 |
+
"total": 5,
|
| 13 |
+
"accuracy": 1.0
|
| 14 |
+
}
|
| 15 |
+
]
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
## How to compare with other small models
|
| 19 |
+
|
| 20 |
+
Install optional dependencies:
|
| 21 |
+
|
| 22 |
+
```bash
|
| 23 |
+
pip install torch transformers accelerate
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
Run:
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
python benchmark/benchmark_compare_small_models.py \
|
| 30 |
+
--models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
The script writes:
|
| 34 |
+
|
| 35 |
+
```text
|
| 36 |
+
results/small_model_comparison.json
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
## Important
|
| 40 |
+
|
| 41 |
+
AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning.
|
benchmark/benchmark_compare_small_models.py
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Compare AION against small Hugging Face causal LMs on the same tiny local suite.
|
| 3 |
+
|
| 4 |
+
This script is optional. It requires transformers/torch for HF baselines.
|
| 5 |
+
Example:
|
| 6 |
+
python benchmark/benchmark_compare_small_models.py --models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct
|
| 7 |
+
"""
|
| 8 |
+
from __future__ import annotations
|
| 9 |
+
import argparse, json, re, time
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
import sys
|
| 12 |
+
sys.path.append(str(Path(__file__).resolve().parents[1]))
|
| 13 |
+
from aion import generate as aion_generate
|
| 14 |
+
|
| 15 |
+
TESTS = [
|
| 16 |
+
{"suite":"chat", "prompt":"hola", "contains":["hello", "awake"]},
|
| 17 |
+
{"suite":"python", "prompt":"write code to keep numbers greater than 12", "contains":["x > 12", "filter"]},
|
| 18 |
+
{"suite":"web", "prompt":"create a responsive landing page with dark mode", "contains":["<!doctype html>", "@media"]},
|
| 19 |
+
{"suite":"math", "prompt":"solve 2x + 5 = 17", "contains":["6"]},
|
| 20 |
+
{"suite":"science", "prompt":"force mass 10 acceleration 2", "contains":["20"]},
|
| 21 |
+
]
|
| 22 |
+
|
| 23 |
+
def score_output(out, needles):
|
| 24 |
+
low = out.lower()
|
| 25 |
+
return any(n.lower() in low for n in needles)
|
| 26 |
+
|
| 27 |
+
def eval_generator(name, gen):
|
| 28 |
+
rows=[]; passed=0; t0=time.time()
|
| 29 |
+
for t in TESTS:
|
| 30 |
+
out=gen(t["prompt"])
|
| 31 |
+
ok=score_output(out, t["contains"])
|
| 32 |
+
passed += int(ok)
|
| 33 |
+
rows.append({"suite":t["suite"],"prompt":t["prompt"],"passed":ok,"output_preview":out[:500]})
|
| 34 |
+
return {"model":name,"passed":passed,"total":len(TESTS),"accuracy":passed/len(TESTS),"seconds":time.time()-t0,"rows":rows}
|
| 35 |
+
|
| 36 |
+
def hf_generator(model_id, max_new_tokens=350):
|
| 37 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 38 |
+
import torch
|
| 39 |
+
tok=AutoTokenizer.from_pretrained(model_id)
|
| 40 |
+
model=AutoModelForCausalLM.from_pretrained(model_id, device_map="auto" if torch.cuda.is_available() else None)
|
| 41 |
+
model.eval()
|
| 42 |
+
def gen(prompt):
|
| 43 |
+
full=f"Answer the request.\nRequest: {prompt}\nAnswer:"
|
| 44 |
+
inputs=tok(full, return_tensors="pt")
|
| 45 |
+
inputs={k:v.to(model.device) for k,v in inputs.items()}
|
| 46 |
+
with torch.no_grad():
|
| 47 |
+
out=model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, pad_token_id=tok.eos_token_id)
|
| 48 |
+
return tok.decode(out[0], skip_special_tokens=True)
|
| 49 |
+
return gen
|
| 50 |
+
|
| 51 |
+
def main():
|
| 52 |
+
ap=argparse.ArgumentParser()
|
| 53 |
+
ap.add_argument("--models", nargs="*", default=[])
|
| 54 |
+
ap.add_argument("--out", default="results/small_model_comparison.json")
|
| 55 |
+
args=ap.parse_args()
|
| 56 |
+
results=[eval_generator("AION-1", aion_generate)]
|
| 57 |
+
for model_id in args.models:
|
| 58 |
+
try:
|
| 59 |
+
results.append(eval_generator(model_id, hf_generator(model_id)))
|
| 60 |
+
except Exception as e:
|
| 61 |
+
results.append({"model":model_id,"error":str(e)})
|
| 62 |
+
out=Path(__file__).resolve().parents[1]/args.out
|
| 63 |
+
out.parent.mkdir(exist_ok=True)
|
| 64 |
+
out.write_text(json.dumps(results, indent=2, ensure_ascii=False), encoding="utf-8")
|
| 65 |
+
print(json.dumps([{k:v for k,v in r.items() if k!='rows'} for r in results], indent=2))
|
| 66 |
+
if __name__=="__main__": main()
|
download/AION-1.zip
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9a38c74333f4566e9cb502d0530ccbee3a33b9e4c005766d3afe16ff95744aac
|
| 3 |
+
size 3317364
|
results/small_model_comparison.json
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"model": "AION-1",
|
| 4 |
+
"passed": 5,
|
| 5 |
+
"total": 5,
|
| 6 |
+
"accuracy": 1.0,
|
| 7 |
+
"seconds": 0.38956117630004883,
|
| 8 |
+
"rows": [
|
| 9 |
+
{
|
| 10 |
+
"suite": "chat",
|
| 11 |
+
"prompt": "hola",
|
| 12 |
+
"passed": true,
|
| 13 |
+
"output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: math_word_problem=0.75, chat_greeting=0.25, python=0.00, web=0.00, math_arithmetic=0.00\n- Rule override selected: chat_greeting.\n\n## Answer\nHello! I am awake. I can chat, solve math, and write Python. What do you want to build or calculate?"
|
| 14 |
+
},
|
| 15 |
+
{
|
| 16 |
+
"suite": "python",
|
| 17 |
+
"prompt": "write code to keep numbers greater than 12",
|
| 18 |
+
"passed": true,
|
| 19 |
+
"output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: python=1.00, math_word_problem=0.00, web=0.00, math_arithmetic=0.00, math_probability=0.00\n- Rule override selected: python.\n- Inside Python subsystem: filter=1.00, is_prime=0.00, fibonacci=0.00\n\n## Answer\n```python\ndef filter_greater_than_12(numbers):\n result = []\n for x in numbers:\n if x > 12:\n result.append(x)\n retu"
|
| 20 |
+
},
|
| 21 |
+
{
|
| 22 |
+
"suite": "web",
|
| 23 |
+
"prompt": "create a responsive landing page with dark mode",
|
| 24 |
+
"passed": true,
|
| 25 |
+
"output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: web=1.00, math_word_problem=0.00, python=0.00, math_statistics=0.00, science_physics=0.00\n- Rule override selected: web.\n- Inside Web subsystem: landing_page=1.00, dark_mode=0.00, full_page=0.00\n\n## Answer\n```html\n<!doctype html>\n<html lang=\"en\">\n<head>\n <meta charset=\"utf-8\" />\n <meta name=\"viewport\" content=\"width=device-width, initial-scale"
|
| 26 |
+
},
|
| 27 |
+
{
|
| 28 |
+
"suite": "math",
|
| 29 |
+
"prompt": "solve 2x + 5 = 17",
|
| 30 |
+
"passed": true,
|
| 31 |
+
"output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: math_linear_equation=1.00, math_quadratic=0.00, math_arithmetic=0.00, science_physics=0.00, math_statistics=0.00\n- Selected: math_linear_equation.\n\n## Answer\nSolve linear equation. Move terms conceptually into ax + b = 0. Here a=2, b=-12, so x = -b/a = 6."
|
| 32 |
+
},
|
| 33 |
+
{
|
| 34 |
+
"suite": "science",
|
| 35 |
+
"prompt": "force mass 10 acceleration 2",
|
| 36 |
+
"passed": true,
|
| 37 |
+
"output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: math_word_problem=1.00, science_physics=0.00, python=0.00, web=0.00, math_arithmetic=0.00\n- Rule override selected: science_physics.\n\n## Answer\nNewton's second law: F=ma=10×2=20 N."
|
| 38 |
+
}
|
| 39 |
+
]
|
| 40 |
+
}
|
| 41 |
+
]
|