changcheng967 commited on
Commit
5f945c0
Β·
verified Β·
1 Parent(s): 79d1bf7

Add cpuflow-v97-memory model weights and model card

Browse files
Files changed (3) hide show
  1. README.md +95 -0
  2. best.pt +3 -0
  3. tokenizer.json +0 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: pytorch
5
+ tags:
6
+ - text-generation
7
+ - cpu
8
+ - cpu-inference
9
+ - cpuflow
10
+ - flashlm
11
+ ---
12
+
13
+ # CPUFlow v9.7 β€” Memory-Enhanced Semi-Coherent Model
14
+
15
+ Best semi-coherent model in the CPUFlow series. Adds RAM-Net sparse memory to the v5-LN cumsum backbone for a 1.7 PPL improvement without breaking coherence.
16
+
17
+ ## Results
18
+
19
+ | Metric | v5-LN (baseline) | v9.7 (memory-enhanced) |
20
+ | --- | --- | --- |
21
+ | Val PPL | 11.94 | **10.23** |
22
+ | Parameters | 2.0M | 2.47M |
23
+ | Speed | 7,833 tok/s | 3,369 tok/s |
24
+ | Coherent? | Semi | Semi |
25
+ | NaN events | 0 | 0 |
26
+
27
+ ## Architecture
28
+
29
+ ```
30
+ embed + CumStepPos β†’ [RAMScanBlock Γ— 6] β†’ LayerNorm β†’ tied output + FSP
31
+
32
+ RAMScanBlock:
33
+ # Cumsum backbone (same as v5-LN)
34
+ x_n = LayerNorm(x)
35
+ h = W_proj(x_n) # fused: d β†’ 3k
36
+ query, key, value = chunk(h, 3)
37
+ key = sigmoid(key); value = tanh(value)
38
+ scan_out = W_m(query * cumsum(key*value) / cumsum(key))
39
+
40
+ # RAM-Net sparse memory sidepath
41
+ addr = W_addr(x_n) β†’ Product Softmax β†’ Top-8 of 512 virtual slots
42
+ mem_out = sparse_read_write(addr, x_n)
43
+ merged = scan_out + W_mem_proj(mem_out) # direct addition, no gate
44
+
45
+ x = x + W_out(merged)
46
+ x = x + ff_down(relu(ff_up(LayerNorm(x))))
47
+ ```
48
+
49
+ ## Generation Samples
50
+
51
+ Prompt: "Lily and Tim went to the park. They"
52
+ > ...They saw many kids playing near the back house. They went up to a tree and gave them to their dad. They were very happy. After a while, they saw a big pile of ants. It was not a normal day. They did not want to play hide behind. Tim and his friends were scared, but they did not want to go home.
53
+
54
+ Prompt: "There was a little girl named Lily. She loved to play with her friends. One day"
55
+ > ...she put her shoes in the park. In the park, Lily saw a big lock on the ground. She wanted to open it. She tried to open the key, but it was too small. She tried to unlock the door open, but she could not.
56
+
57
+ ## Limitations
58
+
59
+ - Semi-coherent at best. Named characters and pronoun tracking work early, but coherence breaks down ~100 tokens in.
60
+ - "She tried to open the key" β€” semantic confusion from cumsum state blending.
61
+ - Story drifts between scenes with no transition (park β†’ church).
62
+ - 2.3x slower than v5-LN baseline due to memory overhead.
63
+ - Trained on TinyStories only β€” children's vocabulary, no general knowledge.
64
+
65
+ ## Key Finding
66
+
67
+ Sparse memory (RAM-Net Product Softmax, 512 slots, Top-8) improves PPL by 1.7 points as a parameter-efficient capacity expansion. It does NOT solve entity tracking β€” at 2.5M params, the binding threshold (~160M params) makes entity-specific addressing impossible. The memory just adds raw capacity.
68
+
69
+ ## Usage
70
+
71
+ ```python
72
+ import torch
73
+ from tokenizers import Tokenizer
74
+
75
+ tokenizer = Tokenizer.from_file("tokenizer.json")
76
+ checkpoint = torch.load("best.pt", map_location="cpu")
77
+ # Build model (see train_cpuflow_v97_simple_memory.py for full architecture)
78
+ # Generate with temperature=0.8
79
+ ```
80
+
81
+ See [GitHub](https://github.com/changcheng967/FlashLM) for full training code.
82
+
83
+ ## Citation
84
+
85
+ ```bibtex
86
+ @misc{Chang,
87
+ title = {FlashLM: CPU-Native Language Models Trained From Scratch on Free-Tier Hardware},
88
+ author = {Chang, Cheng},
89
+ year = {2026},
90
+ publisher = {Zenodo},
91
+ doi = {10.5281/zenodo.20113960}
92
+ }
93
+ ```
94
+
95
+ MIT License.
best.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f10af7d008c1789a9b83001d5e9c2bffcbe20408b41c2954a8758ab19c38f019
3
+ size 9939155
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff