{
  "entry_class": "model.HornerRNN",
  "output_base": 2,
  "framework": "pytorch",
  "model_description": "Bit-sequential RNN (~15.4M params across three weight files) for modular multiplication with primes up to 2^2048. The model reads the bits of a mod p MSB-first, one per step, conditioned on (b mod p, p) in binary. Its hidden state is a hard-quantized bit vector, and the transition function is a learned carry-aware dilated-convolution TCN trained to implement the Horner step (t, bit, b, p) -> (2*t + bit*b) mod p. The final hidden state bits are emitted MSB-first as the base-2 answer. Routing uses the narrowest state width that can hold p. A SINGLE shared TCN weight-set, weights_shared_16_512.pt, serves 16/32/64/128/256/512-bit states (tiers 1-8) at each prime's native width and reaches tiers 1-8 = 1.00 on the public benchmark. Dedicated TCN cells weights1024.pt and weights2048.pt cover tier 9 and tier 10, reaching tier 9 = 0.99 and tier 10 = 1.00. For p >= 2^2048 the model emits the honest [0] fallback without invoking the network. The arithmetic is not in Python code: tokenization, scan, threshold, and readout are architecture, while doubling, conditional add, compare/borrow, and reduction are all learned in the trained cell weights; random or perturbed weights collapse to the floor.",
  "training_description": "Each cell is trained from exact single-step labels (t, bit, b, p) -> (2*t + bit*b) mod p, with BCE per state bit, AdamW, cosine decay, gradient clipping, EMA checkpointing, and held-out-prime validation. Training data uses true Horner-trajectory states plus boundary-focused examples; prime sampling is value-uniform to match the challenge generator, with bit-length-uniform bands where needed so the reduction boundary is seen at every position. The current shared 16-512 file was built by warm-starting from the shipped shared 64-512 carry-aware TCN (14 residual TCN blocks, 256 channels, dilations cycling through 1..256), fine-tuning on a {16,32,64,128,256,512} width mix, then averaging that run with a small-tier polish tail: soup25 = 0.75 * unified_16to512_warm_s0.final + 0.25 * unified_16to512_smalltail_s1.final. The warm run used a 16-heavy but 512-preserving distribution (16:.40,32:.18,64:.08,128:.08,256:.08,512:.18), accum=8, lr=2e-4, off-trajectory batches (offtraj-frac=.20,k=4), and width-native validation. The polish tail warm-started from that candidate with lower lr=6e-5 and weights 16:.55,32:.25,64:.04,128:.04,256:.04,512:.08, then the soup was selected by public score plus matched 5-prime bootstrap. On the fixed public benchmark this merge lifts the shipped model from overall 0.997 to 0.999: tiers 1-8 = 1.00, tier 9 = 0.99, tier 10 = 1.00. A matched faithful bootstrap over tiers 1-8 (5 primes/tier structure, pool120, k30, boot200k, seed 515151) ties tiers 1-2 and improves tiers 3-8; tier 8 E[acc] improves 0.9866 -> 0.9931 and P(tier8<0.95) drops 1.396% -> 0.205%. The 1024-bit cell was trained separately with benchmark-width-matched primes in [2^513,2^1024), gradient accumulation, and worst-bit margin loss; it remains byte-unchanged and scores tier 9 = 0.99. The 2048-bit cell was bootstrapped from the 1024-bit cell by octave transfer, hardened with low-lr margin tails, then weight-souped across independent margin tails; it remains byte-unchanged and scores tier 10 = 1.00 while reducing faithful 5-prime tier-10 tail risk. Full all-width single-cell unification including 1024/2048 was tested and rejected because one ~5M cell could not preserve 2048-chain robustness while serving small/mid widths; the shipped design intentionally keeps dedicated 1024 and 2048 cells. Compliance checks: preprocess hooks are identity, the legal two-operand reductions a%p and b%p are used only for input normalization, perturbing trained weights collapses accuracy toward the untrained floor, and held-out-prime generalization tracks train accuracy."
}