--- language: en license: apache-2.0 tags: - symbiogenesis - lora - grammar - cola - evolutionary - custom-architecture - attention-free base_model: LisaMegaWatts/SymbioSLM datasets: - glue pipeline_tag: text-classification --- # SymbioSLM Grammar Expert LoRA A grammar-specialist LoRA adapter for [SymbioSLM](https://huggingface.co/LisaMegaWatts/SymbioSLM) (~4.3M params), trained on CoLA (Corpus of Linguistic Acceptability) via **symbiogenesis evolution**. This is an **attention-free** model — all sequence mixing uses sub-quadratic organelles (CausalConv, Monarch matrices, LongConv). Since SymbioSLM has no PyTorch checkpoint (it's Julia-native), this experiment trained with the **full base model unfrozen** alongside LoRA, testing whether the attention-free architecture can learn grammar from scratch. ## Key Results | Metric | At Gelation (gen 6) | Final (gen 24) | |--------|-------------------|----------------| | Train accuracy | 80.4% | 80.4% | | Test accuracy | 44.6% | **60.6%** | | Overfit gap | 35.8pp | 19.8pp | | Metric | Value | |--------|-------| | Random baseline (majority class) | 64.1% | | Base perplexity | 2045.6 | | With LoRA perplexity | 2051.0 (+0.3%) | | Grammar sense improvement | +0.009 (log-prob ratio) | | Gelation (convergence) | Generation 6 | | LoRA params | 2,468,116 (57.9% of base — unfrozen) | ### Grammar Sense Signal The LoRA-adapted model assigns relatively higher probability to grammatical sentences: ``` Base With LoRA Acceptable log-prob: -7.619 -7.617 Unacceptable log-prob: -7.625 -7.632 Ratio (higher=better): 0.006 0.015 (+150% relative) ``` This is a small but directionally correct signal from a random-init 4M attention-free model. ## Architecture SymbioSLM is a **3-organelle** decoder-only language model with NO attention: - **CausalDepthwiseConv1d** — local n-gram pattern detection - **MonarchMatrix** (8 heads) — sub-quadratic global mixing via butterfly factorization - **LongConv** — dense causal convolution for medium-range dependencies - **OrganelleGate** — learned per-channel blend across organelles ``` SymbioSLM: d_model=256, n_layers=6, n_monarch_heads=8, vocab_size=2000 Total params: 4,261,650 ``` The attention-free design means LoRA can only target **SwiGLU layers** (w1, v, w2), giving 3 target types × 6 blocks = 18 possible injection points — far fewer than attention-equipped models. ## LoRA Configuration Manual LoRA injection (not PEFT) into SwiGLU feed-forward layers: | Target | Layer Type | Per Block | |--------|-----------|-----------| | w1 | SwiGLU gate projection | 256→512 | | w2 | SwiGLU output projection | 512→256 | **Best evolved config**: rank=16, alpha=32.0, targets=(w1, w2) Evolution consistently converged on the **gate+output pair** (w1, w2), preferring this over configurations that include the value projection (v). ## Evolution Details 1. **Population**: 8 random LoRAUnit configs 2. **Training**: 200 steps per unit, lr=2e-4, batch=16, **base unfrozen** (no pre-trained checkpoint) 3. **Fitness**: `accuracy - 0.01 × log(n_trainable)` 4. **Gelation**: CUSUM change-point at generation 6 (CUSUM=4.10) 5. **Post-gelation**: Architecture locked (r=16, w1+w2) but test accuracy continued improving ### Test Accuracy Over Time ``` Gen 0: 40.4% Gen 5: 54.0% (pre-gelation) Gen 6: 40.0% (at gelation) Gen 10: 61.2% Gen 15: 57.4% Gen 20: 56.0% Gen 24: 60.6% (final) ``` Test accuracy oscillated but trended upward, suggesting continued evolution post-gelation was beneficial for this model. Gelation marked architecture convergence, not a generalization peak. ## Usage Requires the SymbioSLM model architecture. See the [training notebook](https://github.com/DavinciDreams/SymbioGPT) for the full model definition. ```python import torch from huggingface_hub import hf_hub_download # Load LoRA weights weights_path = hf_hub_download( "LisaMegaWatts/SymbioSLM-GrammarExpert-20260301", "lora_state.pt" ) lora_state = torch.load(weights_path, map_location="cpu") # Inject into SymbioSLM base model # inject_lora(model, target_modules=['w1', 'w2'], rank=16, alpha=32.0) # load_lora_state(model, lora_state) ``` ## Files | File | Description | |------|-------------| | `lora_state.pt` | LoRA A/B parameter state dict (696 KB) | | `experiment_config.json` | Full experiment config and results | ## Part of Symbiogenesis This is part of a **3-model grammar expert comparison**: | Model | Params | Attention | CoLA Test Acc | Status | |-------|--------|-----------|---------------|--------| | [Ouroboros (Gemma 270M)](https://huggingface.co/LisaMegaWatts/Ouroboros-1MContext-Gemma-270m) | 270M | Yes (standard) | Pending | Notebook ready | | [SymbioGPT-10M](https://huggingface.co/LisaMegaWatts/SymbioGPT-GrammarExpert-20260301) | 10M | Yes (+ organelles) | 53.2% | Complete | | **SymbioSLM ~4M** (this) | 4.3M | **No** | **60.6%** | Complete | **W&B run**: [grammar-expert-symbioslm](https://wandb.ai/lisamegawatts-decentralized-intelligence-agency/symbiogenesis/runs/0qlysv4e) GitHub: [DavinciDreams/SymbioGPT](https://github.com/DavinciDreams/SymbioGPT)