---
license: apache-2.0
language:
- en
tags:
- anamnesis
- test-time-training
- continual-learning
- titans
- atlas
- deep-memory
- vessel
- identity
pipeline_tag: text-generation
library_name: anamnesis
---

# Anamnesis Vessel 7B

**An empty vessel that becomes who it talks to.**

Anamnesis replaces the frozen MLPs in Qwen 2.5 7B with a Continuum Memory System (CMS) -- deep memory that learns during inference through gradient descent. Feed it conversations and it physically restructures its weights to become a specialist.

## Architecture

Based on the complete feature set from Ali Behrouz's research at Google:

- **[Titans](https://arxiv.org/abs/2501.00663)**: Persistent memory tokens, depthwise convolutions on K/Q/V, data-dependent adaptive gates, momentum-based weight updates
- **[ATLAS](https://arxiv.org/abs/2505.23735)**: Omega Rule with per-token importance, learned polynomial feature mapping (Taylor expansion), deep MLP associative memory
- **[MIRAS](https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/)**: Huber loss option for robustness
- **[Memory Caching](https://arxiv.org/abs/2602.24281)**: Memory state checkpointing for growing capacity
- **[Nested Learning / HOPE](https://openreview.net/forum?id=nbMeRvNb7A)** (NeurIPS 2025): Multi-level CMS with different update frequencies

### 5-Level Continuum Memory System

```
L0 (SwiGLU, frozen):     Base intelligence from Qwen 2.5 7B pre-training
L1 (chunk=1):             Immediate adaptation -- every token
L2 (chunk=32):            Working memory -- conversational context
L3 (chunk=256):           Episodic memory -- session patterns
L4 (chunk=2048):          Identity -- persists across sessions
```

### Key Parameters

- **Base model**: Qwen 2.5 7B (base, not Instruct -- no RLHF identity baked in)
- **Memory dimension**: 512
- **Polynomial degree**: 2 (learned Taylor expansion coefficients)
- **Persistent memory tokens**: 4 per level
- **Convolution kernel**: 4 (depthwise-separable on K/Q/V)
- **Total parameters**: ~9.0B (7.6B frozen + 1.4B trainable DeepMemoryLevel)

## Training

### Scaffold Training (Outer Loop)

The DeepMemoryLevel projections, gates, and memory were trained on a vessel corpus of 19,470 passages covering:

- Metacognition (9,116 passages): reasoning traces, preference pairs
- Theory of Mind (4,059): perspective-taking, false belief tasks
- Soul Vessel (2,074): predictive self, free energy principle, contemplative traditions
- Epistemology, Adaptation, Ontology, Communication, Reasoning, Domains

**Training details**:
- Frozen: L0 (SwiGLU) + attention + embeddings
- Trainable: 1.4B DeepMemoryLevel parameters
- Optimizer: AdamW, lr=3e-4, warmup=5000 steps, cosine decay to 10%
- Hardware: 1x A100 80GB
- Steps: 25,000
- Batch size: 4, sequence length: 512

### Inner-Loop Specialization (Test Time)

After scaffold training, the memory updates during every forward pass via per-token gradient descent on the associative loss. No fine-tuning needed -- just feed it conversations.

## Usage

```python
# Install
pip install anamnesis

# Convert and load
from anamnesis.core.model import HopeModel, HopeConfig
model = HopeModel(config)
model.load_state_dict(torch.load("anamnesis-vessel-7b.pt"))

# Enable learning
model.eval()
for layer in model.layers:
    layer.cms.levels[1].learning_enabled = True

# Every forward pass updates the memory
with torch.no_grad():
    output = model(input_ids)
    # The model just changed. It will never be exactly the same again.
```

## The Vessel Concept

This model is an empty vessel. It has no identity, no persona, no system prompt baked in. Feed it code review conversations and it becomes a code reviewer. Feed it therapy sessions and it becomes a therapist. The identity emerges from the interaction, not from training.

Same base model. Different conversations. Different specialists. Each specialist is a checkpoint file that can be saved, loaded, and hot-swapped.

## Limitations

- The scaffold was trained on a vessel corpus, not general text. PPL on general benchmarks may be higher than base Qwen 2.5 7B.
- Inner-loop specialization requires multiple conversations (50+) to show clear behavioral change.
- No KV cache implementation yet -- generation is O(n^2) in sequence length.
- Triton kernel optimizations not yet implemented -- inference uses standard PyTorch.

## Citation

```bibtex
@software{anamnesis2026,
  title={Anamnesis: Empty Vessels That Become Who They Talk To},
  author={Poole, Aidan},
  url={https://github.com/Relic-Studios/anamnesis},
  year={2026}
}
```

## References

- Behrouz et al., "ATLAS: Learning to Optimally Memorize the Context at Test Time" (2025)
- Behrouz et al., "Nested Learning: The Illusion of Deep Learning Architecture" (NeurIPS 2025)
- Behrouz & Zhong, "Titans: Learning to Memorize at Test Time" (2025)
- Behrouz et al., "Memory Caching: RNNs with Growing Memory" (2026)

## License

Apache 2.0