--- title: Small Cuts emoji: 🎬 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 6.18.0 app_file: app.py pinned: true license: mit short_description: A deadpan narrator for your life, from small open models. tags: - track:wood - achievement:offgrid - achievement:offbrand - achievement:llama - achievement:fieldnotes --- # Small Cuts 🎬 > *"And that was the moment Carlos realized the coffee had been decaf all along."* **Small Cuts** turns first-person moments into grounded, cinematic, **spoken** narration β€” an omniscient, slightly-too-honest narrator in the spirit of *The Invention of Lying* β€” using only **small (≀32B) open models**. No script, no cloud LLM: a small vision-language model watches your moment and a small TTS speaks the line, the way a film narrator would if your life were the film. There is exactly **one narrator** β€” a single deadpan, unnamed voice. No menus, no director to pick. You point at what's happening; it tells you what it means. This is the **challenger submission** for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon) ("Small Models, Big Adventures" β€” Gradio Γ— Hugging Face, submissions close **June 15, 2026, 23:59 UTC**), the strategic successor to the original *Director's Cut* project. ## The soul of it β€” the Action-to-Cut loop Small Cuts was born wearing glasses. The intended experience is a **live loop**: ``` Ray-Ban Meta glasses ──image frames──▢ home engine (small VLM + TTS) ──▢ narration in your ear β”‚ └──── finished cuts ────▢ the Space (watch Β· library) ``` You walk through a moment, tap **Action!**, then tap **Cut!** when the scene has a readable beat. The narrator watches a selected first-person frame and speaks one grounded, deadpan line back in your ear while the moment is still *recent past*. The finished cut then lands in the Space as a short POV clip with synced captions, title, voice, and library thumbnail. **One completed-cut experience, multiple inputs:** from the Space's point of view, glasses cuts and authenticated browser uploads resolve to the same artifact shape: a finished video, generated title, generated narration, Kokoro voice, synced captions, and a library tile. Glasses remain the private wearer path; browser uploads are a judge-verifiable path with no glasses or iOS required. ## What's in this Space The Space is the **view platform + library** half of the loop β€” a small streaming-channel UI: - **A live stage** with the current moment and **movie-style subtitles** (short phrase-sized lines over a constant dark bar, advancing with the voice-over). - **Voice-over replay**, with a compact custom player whose video, sound, captions, and progress share the same audio clock. - **A public library** of real Ray-Ban Meta glasses moments, generated through the same local engine path so the channel is never empty. The source clips and mark points are curated; the visible titles, narration, voice, thumbnails, and clips are produced by Small Cuts. - **"Try it"** β€” a tucked-away, HF-login upload drawer that sends your short video to a private Modal post-cut service, then replays the generated cut in the same theater. ## How it was built | Piece | Choice | Why | |---|---|---| | Narrator (VLM) | `Qwen/Qwen3-VL-8B-Instruct` | Strong grounded captioning at 8B β€” well under 32B | | Voice (TTS) | **Kokoro** (24 kHz) | Tiny, expressive, open; one signature deadpan delivery | | Space runtime | Gradio 6 on CPU | Public theater + library; uploads call Modal instead of warming models | | Judge upload service | Modal GPU app (`small-cuts-postcut`) | Finished-video verification path with real Qwen + Kokoro output | | Local live engine | FastAPI WS home node, **llama.cpp** | The in-ear loop + demo video; no cloud LLM/TTS API | | Capture | iOS app for Ray-Ban Meta glasses (`ios/SmallCuts/`) | First-person moments, the way it's meant to be lived | Built by **Carlos Crespo Macaya** as architect and lead. Development was accelerated with an AI toolchain: Claude (Opus) for design critique, Codex (GPT-5.x) for paired implementation, GLM for review, and Gemini for eval, all directed by Carlos. ## Hackathon compliance | Rule | How Small Cuts complies | |---|---| | Gradio app hosted as a Space under the org | The app **is** the product β€” this Space | | Every model < 32B | 8B VLM narrator + small Kokoro TTS, all open weights | | Demo video | Filmed POV with Ray-Ban Meta glasses β†’ narrated by the app *(pending final link below)* | | Social post | Linked from this README *(pending final link below)* | | Track 2 β€” **Thousand Token Wood** (`track:wood`) | Whimsical, delightful, AI-load-bearing, original | | Off the Grid (`achievement:offgrid`) | Live inference/TTS runs on local hardware; public Space reads finished cuts only | | Llama (`achievement:llama`) | The live engine runs through `llama.cpp` | - πŸ“Ή **Demo video:** _TODO β€” add public link before submission_ - πŸ“£ **Social post:** _TODO β€” add link before submission_ - πŸ“ **Field notes:** [hf.co/blog/macayaven/small-cuts-field-notes](https://huggingface.co/blog/macayaven/small-cuts-field-notes) **Bonus quests claimed:** Off-Brand (`offbrand`, custom cinematic frontend) Β· Off the Grid (`offgrid`, local small-model engine for the live loop) Β· Llama (`llama`, llama.cpp) Β· Field Notes (`fieldnotes`, the write-up above). ## Quick start ```bash # install (CI-equivalent minimal) uv sync --extra dev # run the Space/viewer locally with the deterministic mock backend (no model download) SMALL_CUTS_BACKEND=mock uv run --no-sync python app.py # run with the real local VLM (downloads weights) SMALL_CUTS_BACKEND=transformers uv run --no-sync python app.py # run the real-time engine (needs `brew install llama.cpp`) SMALL_CUTS_BACKEND=llama_cpp SMALL_CUTS_TTS_BACKEND=kokoro uv run python -m small_cuts.engine # run the hybrid relay + Modal upload Space locally (token comes from your local secret env) SMALL_CUTS_RELAY_BUCKET=build-small-hackathon/small-cuts-scenes \ SMALL_CUTS_RELAY_PREFIX=relay \ SMALL_CUTS_ENABLE_UPLOAD_SANDBOX=1 \ SMALL_CUTS_MODAL_API_URL=https://macayaven--small-cuts-postcut-api.modal.run \ uv run --no-sync python app.py # the gate (mirrors CI exactly) uv run ruff check . && uv run ruff format --check . && uv run pytest ``` ## Repository map - `app.py` β€” Hugging Face Space entrypoint (Gradio CPU viewer/library) - `src/small_cuts/` β€” the product: `viewer.py` (streaming viewer), `narrator.py` (VLM backends), `tts.py` (Kokoro), `styles.py` (grounded prompt), `engine/` (real-time home node), `seed_media/` - `ios/SmallCuts/` β€” the Ray-Ban Meta glasses capture app - `docs/` β€” [hackathon rules](docs/hackathon-rules.md) Β· [architecture](docs/product/architecture.md) Β· [contracts](docs/contracts/) Β· [progress](docs/progress.md) - `CLAUDE.md` β€” operational conventions (the canonical command list lives here) ## Engineering discipline - `main` is protected (PR-based workflow); CI runs ruff lint + format check, pytest, and a gitleaks secret scan on every push/PR. - **No secrets in the repo, ever.** Secrets live in 1Password Connect (local dev) and HF Space secrets (deployment). Client-facing endpoints use Tailnet MagicDNS HTTPS, never raw IPs.