Spaces:

changcheng967
/

flashlm-v4-demo

Sleeping

Apply for a GPU community grant: Academic project

by changcheng967 - opened Feb 19

Owner Feb 19

Project Name: FlashLM v5 "Thunderstorm" — Democratizing LLMs via Ternary CPU Optimization

Objective: We are requesting hardware support to train and demo FlashLM v5, an open-source 1.58-bit (ternary) language model. Unlike standard models that require massive GPU clusters, FlashLM is designed for high-performance inference on consumer CPUs using pure addition/subtraction, bypassing the floating-point multiplication bottleneck.

Why this project matters:

Extreme Efficiency: Our v5 architecture has already shown an 88% score on associative recall benchmarks (up from 3% in v4), proving that ternary models can achieve high-level coherence.

Accessibility: We aim to provide a GPT-2/3 level experience that runs on standard laptops and edge devices, reducing the global "compute divide."

Open Science: All weights, training code, and our synthetic "Step-by-Step" distillation datasets will be fully open-sourced on Hugging Face.

Hardware Justification:

Current Status: We have reached the limits of free-tier hardware (2-core CPUs). Training a 50M–100M parameter model on a 1B+ token corpus requires higher core counts and larger L3 caches (e.g., EPYC or Xeon) to validate our "Thunderstorm" memory-mapped architecture.

Request: We are seeking a CPU Upgrade (8 vCPU / 32GB) or ZeroGPU access to host a real-time demo Space. This will allow the community to interact with a high-coherence ternary model and verify its BPC (Bits-Per-Character) performance.

Links:

Repo: github.com/changcheng967/FlashLM

Model Hub: huggingface.co/changcheng967/flashlm-v4-bolt

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment