--- license: apache-2.0 language: - en tags: - anamnesis - test-time-training - continual-learning - titans - atlas - deep-memory - vessel - identity pipeline_tag: text-generation library_name: anamnesis --- # Anamnesis Vessel 7B **An empty vessel that becomes who it talks to.** Anamnesis replaces the frozen MLPs in Qwen 2.5 7B with a Continuum Memory System (CMS) -- deep memory that learns during inference through gradient descent. Feed it conversations and it physically restructures its weights to become a specialist. ## Architecture Based on the complete feature set from Ali Behrouz's research at Google: - **[Titans](https://arxiv.org/abs/2501.00663)**: Persistent memory tokens, depthwise convolutions on K/Q/V, data-dependent adaptive gates, momentum-based weight updates - **[ATLAS](https://arxiv.org/abs/2505.23735)**: Omega Rule with per-token importance, learned polynomial feature mapping (Taylor expansion), deep MLP associative memory - **[MIRAS](https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/)**: Huber loss option for robustness - **[Memory Caching](https://arxiv.org/abs/2602.24281)**: Memory state checkpointing for growing capacity - **[Nested Learning / HOPE](https://openreview.net/forum?id=nbMeRvNb7A)** (NeurIPS 2025): Multi-level CMS with different update frequencies ### 5-Level Continuum Memory System ``` L0 (SwiGLU, frozen): Base intelligence from Qwen 2.5 7B pre-training L1 (chunk=1): Immediate adaptation -- every token L2 (chunk=32): Working memory -- conversational context L3 (chunk=256): Episodic memory -- session patterns L4 (chunk=2048): Identity -- persists across sessions ``` ### Key Parameters - **Base model**: Qwen 2.5 7B (base, not Instruct -- no RLHF identity baked in) - **Memory dimension**: 512 - **Polynomial degree**: 2 (learned Taylor expansion coefficients) - **Persistent memory tokens**: 4 per level - **Convolution kernel**: 4 (depthwise-separable on K/Q/V) - **Total parameters**: ~9.0B (7.6B frozen + 1.4B trainable DeepMemoryLevel) ## Training ### Scaffold Training (Outer Loop) The DeepMemoryLevel projections, gates, and memory were trained on a vessel corpus of 19,470 passages covering: - Metacognition (9,116 passages): reasoning traces, preference pairs - Theory of Mind (4,059): perspective-taking, false belief tasks - Soul Vessel (2,074): predictive self, free energy principle, contemplative traditions - Epistemology, Adaptation, Ontology, Communication, Reasoning, Domains **Training details**: - Frozen: L0 (SwiGLU) + attention + embeddings - Trainable: 1.4B DeepMemoryLevel parameters - Optimizer: AdamW, lr=3e-4, warmup=5000 steps, cosine decay to 10% - Hardware: 1x A100 80GB - Steps: 25,000 - Batch size: 4, sequence length: 512 ### Inner-Loop Specialization (Test Time) After scaffold training, the memory updates during every forward pass via per-token gradient descent on the associative loss. No fine-tuning needed -- just feed it conversations. ## Usage ```python # Install pip install anamnesis # Convert and load from anamnesis.core.model import HopeModel, HopeConfig model = HopeModel(config) model.load_state_dict(torch.load("anamnesis-vessel-7b.pt")) # Enable learning model.eval() for layer in model.layers: layer.cms.levels[1].learning_enabled = True # Every forward pass updates the memory with torch.no_grad(): output = model(input_ids) # The model just changed. It will never be exactly the same again. ``` ## The Vessel Concept This model is an empty vessel. It has no identity, no persona, no system prompt baked in. Feed it code review conversations and it becomes a code reviewer. Feed it therapy sessions and it becomes a therapist. The identity emerges from the interaction, not from training. Same base model. Different conversations. Different specialists. Each specialist is a checkpoint file that can be saved, loaded, and hot-swapped. ## Limitations - The scaffold was trained on a vessel corpus, not general text. PPL on general benchmarks may be higher than base Qwen 2.5 7B. - Inner-loop specialization requires multiple conversations (50+) to show clear behavioral change. - No KV cache implementation yet -- generation is O(n^2) in sequence length. - Triton kernel optimizations not yet implemented -- inference uses standard PyTorch. ## Citation ```bibtex @software{anamnesis2026, title={Anamnesis: Empty Vessels That Become Who They Talk To}, author={Poole, Aidan}, url={https://github.com/Relic-Studios/anamnesis}, year={2026} } ``` ## References - Behrouz et al., "ATLAS: Learning to Optimally Memorize the Context at Test Time" (2025) - Behrouz et al., "Nested Learning: The Illusion of Deep Learning Architecture" (NeurIPS 2025) - Behrouz & Zhong, "Titans: Learning to Memorize at Test Time" (2025) - Behrouz et al., "Memory Caching: RNNs with Growing Memory" (2026) ## License Apache 2.0