---
language: en
license: apache-2.0
tags:
  - symbiogenesis
  - lora
  - grammar
  - cola
  - evolutionary
  - custom-architecture
  - attention-free
base_model: LisaMegaWatts/SymbioSLM
datasets:
  - glue
pipeline_tag: text-classification
---

# SymbioSLM Grammar Expert LoRA

A grammar-specialist LoRA adapter for [SymbioSLM](https://huggingface.co/LisaMegaWatts/SymbioSLM) (~4.3M params), trained on CoLA (Corpus of Linguistic Acceptability) via **symbiogenesis evolution**. This is an **attention-free** model — all sequence mixing uses sub-quadratic organelles (CausalConv, Monarch matrices, LongConv).

Since SymbioSLM has no PyTorch checkpoint (it's Julia-native), this experiment trained with the **full base model unfrozen** alongside LoRA, testing whether the attention-free architecture can learn grammar from scratch.

## Key Results

| Metric | At Gelation (gen 6) | Final (gen 24) |
|--------|-------------------|----------------|
| Train accuracy | 80.4% | 80.4% |
| Test accuracy | 44.6% | **60.6%** |
| Overfit gap | 35.8pp | 19.8pp |

| Metric | Value |
|--------|-------|
| Random baseline (majority class) | 64.1% |
| Base perplexity | 2045.6 |
| With LoRA perplexity | 2051.0 (+0.3%) |
| Grammar sense improvement | +0.009 (log-prob ratio) |
| Gelation (convergence) | Generation 6 |
| LoRA params | 2,468,116 (57.9% of base — unfrozen) |

### Grammar Sense Signal

The LoRA-adapted model assigns relatively higher probability to grammatical sentences:

```
                          Base      With LoRA
Acceptable log-prob:     -7.619     -7.617
Unacceptable log-prob:   -7.625     -7.632
Ratio (higher=better):   0.006      0.015  (+150% relative)
```

This is a small but directionally correct signal from a random-init 4M attention-free model.

## Architecture

SymbioSLM is a **3-organelle** decoder-only language model with NO attention:

- **CausalDepthwiseConv1d** — local n-gram pattern detection
- **MonarchMatrix** (8 heads) — sub-quadratic global mixing via butterfly factorization
- **LongConv** — dense causal convolution for medium-range dependencies
- **OrganelleGate** — learned per-channel blend across organelles

```
SymbioSLM: d_model=256, n_layers=6, n_monarch_heads=8, vocab_size=2000
Total params: 4,261,650
```

The attention-free design means LoRA can only target **SwiGLU layers** (w1, v, w2), giving 3 target types × 6 blocks = 18 possible injection points — far fewer than attention-equipped models.

## LoRA Configuration

Manual LoRA injection (not PEFT) into SwiGLU feed-forward layers:

| Target | Layer Type | Per Block |
|--------|-----------|-----------|
| w1 | SwiGLU gate projection | 256→512 |
| w2 | SwiGLU output projection | 512→256 |

**Best evolved config**: rank=16, alpha=32.0, targets=(w1, w2)

Evolution consistently converged on the **gate+output pair** (w1, w2), preferring this over configurations that include the value projection (v).

## Evolution Details

1. **Population**: 8 random LoRAUnit configs
2. **Training**: 200 steps per unit, lr=2e-4, batch=16, **base unfrozen** (no pre-trained checkpoint)
3. **Fitness**: `accuracy - 0.01 × log(n_trainable)`
4. **Gelation**: CUSUM change-point at generation 6 (CUSUM=4.10)
5. **Post-gelation**: Architecture locked (r=16, w1+w2) but test accuracy continued improving

### Test Accuracy Over Time

```
Gen  0: 40.4%
Gen  5: 54.0%  (pre-gelation)
Gen  6: 40.0%  (at gelation)
Gen 10: 61.2%
Gen 15: 57.4%
Gen 20: 56.0%
Gen 24: 60.6%  (final)
```

Test accuracy oscillated but trended upward, suggesting continued evolution post-gelation was beneficial for this model. Gelation marked architecture convergence, not a generalization peak.

## Usage

Requires the SymbioSLM model architecture. See the [training notebook](https://github.com/DavinciDreams/SymbioGPT) for the full model definition.

```python
import torch
from huggingface_hub import hf_hub_download

# Load LoRA weights
weights_path = hf_hub_download(
    "LisaMegaWatts/SymbioSLM-GrammarExpert-20260301",
    "lora_state.pt"
)
lora_state = torch.load(weights_path, map_location="cpu")

# Inject into SymbioSLM base model
# inject_lora(model, target_modules=['w1', 'w2'], rank=16, alpha=32.0)
# load_lora_state(model, lora_state)
```

## Files

| File | Description |
|------|-------------|
| `lora_state.pt` | LoRA A/B parameter state dict (696 KB) |
| `experiment_config.json` | Full experiment config and results |

## Part of Symbiogenesis

This is part of a **3-model grammar expert comparison**:

| Model | Params | Attention | CoLA Test Acc | Status |
|-------|--------|-----------|---------------|--------|
| [Ouroboros (Gemma 270M)](https://huggingface.co/LisaMegaWatts/Ouroboros-1MContext-Gemma-270m) | 270M | Yes (standard) | Pending | Notebook ready |
| [SymbioGPT-10M](https://huggingface.co/LisaMegaWatts/SymbioGPT-GrammarExpert-20260301) | 10M | Yes (+ organelles) | 53.2% | Complete |
| **SymbioSLM ~4M** (this) | 4.3M | **No** | **60.6%** | Complete |

**W&B run**: [grammar-expert-symbioslm](https://wandb.ai/lisamegawatts-decentralized-intelligence-agency/symbiogenesis/runs/0qlysv4e)

GitHub: [DavinciDreams/SymbioGPT](https://github.com/DavinciDreams/SymbioGPT)