Phase 5 · char-LM

Postnet × Cloudflare — char-LM

Real ML task: next-character prediction. Phase 8 upgrades the model from a mode-collapsing bigram to a context-2 MLP (embed V×E + fc1 2E×H + relu + fc2 H×V + biases), 2 379 params total over a 27-char vocabulary. Phase 6 ships sharded R2 snapshots (parallel fetch). Phase 7 ships federated data shards — each tab scores on its private slice of the text; the shared model fits the union. Random-init loss ≈ log(V) ≈ 3.30.

↑0 B ↓0 B

Live sample — 100 chars seeded from "t", greedy + temperature:

(idle)
target text (340 chars): the bird sings every dawn the cat sleeps in the sun the dog runs to the park…
(idle)

P = 2 379 weights · K = 8 trials per worker · flip size = 6 params per proposal · σ = 0.15. The context-2 MLP escapes the bigram's "the the the" attractor — at R~1000 the sample starts producing pseudo-words like "slin", "soge", "wh". Loss settles around 1.6 nats driven entirely by federated flip-and-accept across data-sharded workers (Phase 7). Phase 9 swaps the in-tab forward pass for a WebGPU one.