Add cpuflow-v97-memory model weights and model card

Browse files

Files changed (3) hide show

README.md +95 -0
best.pt +3 -0
tokenizer.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,95 @@

+---
+language: en
+license: mit
+library_name: pytorch
+tags:
+  - text-generation
+  - cpu
+  - cpu-inference
+  - cpuflow
+  - flashlm
+---
+# CPUFlow v9.7 — Memory-Enhanced Semi-Coherent Model
+Best semi-coherent model in the CPUFlow series. Adds RAM-Net sparse memory to the v5-LN cumsum backbone for a 1.7 PPL improvement without breaking coherence.
+## Results
+| Metric | v5-LN (baseline) | v9.7 (memory-enhanced) |
+| --- | --- | --- |
+| Val PPL | 11.94 | **10.23** |
+| Parameters | 2.0M | 2.47M |
+| Speed | 7,833 tok/s | 3,369 tok/s |
+| Coherent? | Semi | Semi |
+| NaN events | 0 | 0 |
+## Architecture
+```
+embed + CumStepPos → [RAMScanBlock × 6] → LayerNorm → tied output + FSP
+RAMScanBlock:
+  # Cumsum backbone (same as v5-LN)
+  x_n = LayerNorm(x)
+  h = W_proj(x_n)            # fused: d → 3k
+  query, key, value = chunk(h, 3)
+  key = sigmoid(key); value = tanh(value)
+  scan_out = W_m(query * cumsum(key*value) / cumsum(key))
+  # RAM-Net sparse memory sidepath
+  addr = W_addr(x_n) → Product Softmax → Top-8 of 512 virtual slots
+  mem_out = sparse_read_write(addr, x_n)
+  merged = scan_out + W_mem_proj(mem_out)    # direct addition, no gate
+  x = x + W_out(merged)
+  x = x + ff_down(relu(ff_up(LayerNorm(x))))
+```
+## Generation Samples
+Prompt: "Lily and Tim went to the park. They"
+> ...They saw many kids playing near the back house. They went up to a tree and gave them to their dad. They were very happy. After a while, they saw a big pile of ants. It was not a normal day. They did not want to play hide behind. Tim and his friends were scared, but they did not want to go home.
+Prompt: "There was a little girl named Lily. She loved to play with her friends. One day"
+> ...she put her shoes in the park. In the park, Lily saw a big lock on the ground. She wanted to open it. She tried to open the key, but it was too small. She tried to unlock the door open, but she could not.
+## Limitations
+- Semi-coherent at best. Named characters and pronoun tracking work early, but coherence breaks down ~100 tokens in.
+- "She tried to open the key" — semantic confusion from cumsum state blending.
+- Story drifts between scenes with no transition (park → church).
+- 2.3x slower than v5-LN baseline due to memory overhead.
+- Trained on TinyStories only — children's vocabulary, no general knowledge.
+## Key Finding
+Sparse memory (RAM-Net Product Softmax, 512 slots, Top-8) improves PPL by 1.7 points as a parameter-efficient capacity expansion. It does NOT solve entity tracking — at 2.5M params, the binding threshold (~160M params) makes entity-specific addressing impossible. The memory just adds raw capacity.
+## Usage
+```python
+import torch
+from tokenizers import Tokenizer
+tokenizer = Tokenizer.from_file("tokenizer.json")
+checkpoint = torch.load("best.pt", map_location="cpu")
+# Build model (see train_cpuflow_v97_simple_memory.py for full architecture)
+# Generate with temperature=0.8
+```
+See [GitHub](https://github.com/changcheng967/FlashLM) for full training code.
+## Citation
+```bibtex
+@misc{Chang,
+  title        = {FlashLM: CPU-Native Language Models Trained From Scratch on Free-Tier Hardware},
+  author       = {Chang, Cheng},
+  year         = {2026},
+  publisher    = {Zenodo},
+  doi          = {10.5281/zenodo.20113960}
+}
+```
+MIT License.

best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f10af7d008c1789a9b83001d5e9c2bffcbe20408b41c2954a8758ab19c38f019
+size 9939155

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff