Upload AION unified hybrid assistant with local eval results

Browse files

Files changed (8) hide show

README.md +31 -0
assets/aion_architecture.svg +32 -0
assets/aion_benchmark.svg +19 -0
assets/aion_logo.svg +46 -0
benchmark/SMALL_MODEL_COMPARISON.md +41 -0
benchmark/benchmark_compare_small_models.py +66 -0
download/AION-1.zip +3 -0
results/small_model_comparison.json +41 -0

README.md CHANGED Viewed

@@ -18,6 +18,11 @@ pipeline_tag: text-generation
 # AION
 AION is a tiny hybrid local assistant built in a constrained CPU environment. It unifies several learned and symbolic components into one entrypoint:
 ```python
@@ -65,6 +70,27 @@ print(generate("hola"))
   - internet,
   - machine learning.
 ## Architecture
 AION is not a transformer LLM. It is a merged hybrid model:
@@ -95,6 +121,9 @@ print(generate("what can you do"))
 ## Evaluation
 Local evaluation results are in:
 ```text
@@ -114,6 +143,8 @@ Summary:
 Important: these are **not official Hugging Face leaderboard results**. AION is not a standard `transformers` model and cannot be directly submitted to most official HF benchmark leaderboards without a custom evaluation adapter. The GSM8K sample result is included honestly and shows the current limitation on multi-step word problems.
 ## Limitations
 - Not a large language model.

 # AION
+![AION logo](assets/aion_logo.svg)
+![AION architecture](assets/aion_architecture.svg)
 AION is a tiny hybrid local assistant built in a constrained CPU environment. It unifies several learned and symbolic components into one entrypoint:
 ```python
   - internet,
   - machine learning.
+## Download
+You can download the complete ready-to-run package from the repository files:
+```bash
+git lfs install
+git clone https://huggingface.co/VoidWalkercero/AION-1
+cd AION-1
+python aion.py "hola"
+```
+Or from Python:
+```python
+from huggingface_hub import snapshot_download
+path = snapshot_download("VoidWalkercero/AION-1")
+print(path)
+```
+A zipped copy is also included under `download/AION-1.zip`.
 ## Architecture
 AION is not a transformer LLM. It is a merged hybrid model:
 ## Evaluation
+![AION benchmark snapshot](assets/aion_benchmark.svg)
 Local evaluation results are in:
 ```text
 Important: these are **not official Hugging Face leaderboard results**. AION is not a standard `transformers` model and cannot be directly submitted to most official HF benchmark leaderboards without a custom evaluation adapter. The GSM8K sample result is included honestly and shows the current limitation on multi-step word problems.
+For optional comparison with small HF models, see `benchmark/benchmark_compare_small_models.py` and `benchmark/SMALL_MODEL_COMPARISON.md`.
 ## Limitations
 - Not a large language model.

assets/aion_architecture.svg ADDED Viewed

assets/aion_benchmark.svg ADDED Viewed

assets/aion_logo.svg ADDED Viewed

benchmark/SMALL_MODEL_COMPARISON.md ADDED Viewed

	@@ -0,0 +1,41 @@

+# AION vs small models
+This repository includes `benchmark/benchmark_compare_small_models.py` to compare AION with small Hugging Face causal LMs on the same tiny local suite.
+## Local result included
+```json
+[
+  {
+    "model": "AION-1",
+    "passed": 5,
+    "total": 5,
+    "accuracy": 1.0
+  }
+]
+```
+## How to compare with other small models
+Install optional dependencies:
+```bash
+pip install torch transformers accelerate
+```
+Run:
+```bash
+python benchmark/benchmark_compare_small_models.py \
+  --models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct Qwen/Qwen2.5-0.5B-Instruct
+```
+The script writes:
+```text
+results/small_model_comparison.json
+```
+## Important
+AION is not a transformer LLM, so direct benchmark comparisons are not apples-to-apples. AION is tiny, hybrid, and specialized. It can outperform generic small LMs on its hand-designed local suite, but it performs poorly on real multi-step GSM8K reasoning.

benchmark/benchmark_compare_small_models.py ADDED Viewed

	@@ -0,0 +1,66 @@

+#!/usr/bin/env python3
+"""Compare AION against small Hugging Face causal LMs on the same tiny local suite.
+This script is optional. It requires transformers/torch for HF baselines.
+Example:
+  python benchmark/benchmark_compare_small_models.py --models TinyLlama/TinyLlama-1.1B-Chat-v1.0 HuggingFaceTB/SmolLM2-135M-Instruct
+"""
+from __future__ import annotations
+import argparse, json, re, time
+from pathlib import Path
+import sys
+sys.path.append(str(Path(__file__).resolve().parents[1]))
+from aion import generate as aion_generate
+TESTS = [
+    {"suite":"chat", "prompt":"hola", "contains":["hello", "awake"]},
+    {"suite":"python", "prompt":"write code to keep numbers greater than 12", "contains":["x > 12", "filter"]},
+    {"suite":"web", "prompt":"create a responsive landing page with dark mode", "contains":["<!doctype html>", "@media"]},
+    {"suite":"math", "prompt":"solve 2x + 5 = 17", "contains":["6"]},
+    {"suite":"science", "prompt":"force mass 10 acceleration 2", "contains":["20"]},
+]
+def score_output(out, needles):
+    low = out.lower()
+    return any(n.lower() in low for n in needles)
+def eval_generator(name, gen):
+    rows=[]; passed=0; t0=time.time()
+    for t in TESTS:
+        out=gen(t["prompt"])
+        ok=score_output(out, t["contains"])
+        passed += int(ok)
+        rows.append({"suite":t["suite"],"prompt":t["prompt"],"passed":ok,"output_preview":out[:500]})
+    return {"model":name,"passed":passed,"total":len(TESTS),"accuracy":passed/len(TESTS),"seconds":time.time()-t0,"rows":rows}
+def hf_generator(model_id, max_new_tokens=350):
+    from transformers import AutoTokenizer, AutoModelForCausalLM
+    import torch
+    tok=AutoTokenizer.from_pretrained(model_id)
+    model=AutoModelForCausalLM.from_pretrained(model_id, device_map="auto" if torch.cuda.is_available() else None)
+    model.eval()
+    def gen(prompt):
+        full=f"Answer the request.\nRequest: {prompt}\nAnswer:"
+        inputs=tok(full, return_tensors="pt")
+        inputs={k:v.to(model.device) for k,v in inputs.items()}
+        with torch.no_grad():
+            out=model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, pad_token_id=tok.eos_token_id)
+        return tok.decode(out[0], skip_special_tokens=True)
+    return gen
+def main():
+    ap=argparse.ArgumentParser()
+    ap.add_argument("--models", nargs="*", default=[])
+    ap.add_argument("--out", default="results/small_model_comparison.json")
+    args=ap.parse_args()
+    results=[eval_generator("AION-1", aion_generate)]
+    for model_id in args.models:
+        try:
+            results.append(eval_generator(model_id, hf_generator(model_id)))
+        except Exception as e:
+            results.append({"model":model_id,"error":str(e)})
+    out=Path(__file__).resolve().parents[1]/args.out
+    out.parent.mkdir(exist_ok=True)
+    out.write_text(json.dumps(results, indent=2, ensure_ascii=False), encoding="utf-8")
+    print(json.dumps([{k:v for k,v in r.items() if k!='rows'} for r in results], indent=2))
+if __name__=="__main__": main()

download/AION-1.zip ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a38c74333f4566e9cb502d0530ccbee3a33b9e4c005766d3afe16ff95744aac
+size 3317364

results/small_model_comparison.json ADDED Viewed

	@@ -0,0 +1,41 @@

+[
+  {
+    "model": "AION-1",
+    "passed": 5,
+    "total": 5,
+    "accuracy": 1.0,
+    "seconds": 0.38956117630004883,
+    "rows": [
+      {
+        "suite": "chat",
+        "prompt": "hola",
+        "passed": true,
+        "output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: math_word_problem=0.75, chat_greeting=0.25, python=0.00, web=0.00, math_arithmetic=0.00\n- Rule override selected: chat_greeting.\n\n## Answer\nHello! I am awake. I can chat, solve math, and write Python. What do you want to build or calculate?"
+      },
+      {
+        "suite": "python",
+        "prompt": "write code to keep numbers greater than 12",
+        "passed": true,
+        "output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: python=1.00, math_word_problem=0.00, web=0.00, math_arithmetic=0.00, math_probability=0.00\n- Rule override selected: python.\n- Inside Python subsystem: filter=1.00, is_prime=0.00, fibonacci=0.00\n\n## Answer\n```python\ndef filter_greater_than_12(numbers):\n    result = []\n    for x in numbers:\n        if x > 12:\n            result.append(x)\n    retu"
+      },
+      {
+        "suite": "web",
+        "prompt": "create a responsive landing page with dark mode",
+        "passed": true,
+        "output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: web=1.00, math_word_problem=0.00, python=0.00, math_statistics=0.00, science_physics=0.00\n- Rule override selected: web.\n- Inside Web subsystem: landing_page=1.00, dark_mode=0.00, full_page=0.00\n\n## Answer\n```html\n<!doctype html>\n<html lang=\"en\">\n<head>\n  <meta charset=\"utf-8\" />\n  <meta name=\"viewport\" content=\"width=device-width, initial-scale"
+      },
+      {
+        "suite": "math",
+        "prompt": "solve 2x + 5 = 17",
+        "passed": true,
+        "output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: math_linear_equation=1.00, math_quadratic=0.00, math_arithmetic=0.00, science_physics=0.00, math_statistics=0.00\n- Selected: math_linear_equation.\n\n## Answer\nSolve linear equation. Move terms conceptually into ax + b = 0. Here a=2, b=-12, so x = -b/a = 6."
+      },
+      {
+        "suite": "science",
+        "prompt": "force mass 10 acceleration 2",
+        "passed": true,
+        "output_preview": "## Reasoning\n- I read the request using learned character fragments plus lightweight rule checks for common chat/math/code cases.\n- Top learned intents: math_word_problem=1.00, science_physics=0.00, python=0.00, web=0.00, math_arithmetic=0.00\n- Rule override selected: science_physics.\n\n## Answer\nNewton's second law: F=ma=10×2=20 N."
+      }
+    ]
+  }
+]