--- title: Agentic Space Factory emoji: 🏭 colorFrom: blue colorTo: purple sdk: gradio app_file: app.py python_version: "3.11" pinned: false hf_oauth: true hf_oauth_expiration_minutes: 480 hf_oauth_scopes: - read-repos - write-repos - manage-repos - gated-repos - inference-api - jobs - read-billing --- # Agentic Space Factory — V9 LongCat Full-Inference Gate This version validates the safe foundation for a Hugging Face-native “Agentic Space Factory”. It now supports two phases: ```text Phase 1: Gradio Space OAuth user → launch Hugging Face Job with the user's OAuth token → mount private Storage Bucket → write run state/events/report → read run status back in the orchestrator UI Phase 2.1: Gradio Space OAuth user → launch Hugging Face Job → create private target Gradio Space in the user's namespace → upload app.py / requirements.txt / README.md → validate the live Space through gradio_client → write run state/events/report to the Bucket ``` The configured bucket is: ```text hf://buckets/fffiloni/space-factory-runs ``` ## What this version does - Enables Hugging Face OAuth in a Gradio Space. - Requests the `jobs` scope, plus repo/inference scopes needed by later phases. - Launches CPU Hugging Face Jobs using `huggingface_hub.run_job`. - Mounts `fffiloni/space-factory-runs` as `/output` in the Job. - Passes the OAuth token to the Job as an encrypted secret, not as a CLI argument. - Phase 1 writes these files in the bucket: ```text runs//state.json runs//events.jsonl runs//report.md ``` - Phase 2.1 additionally creates a private target Space and stores: ```text runs//target_space.json runs//generated/app.py runs//generated/requirements.txt runs//generated/README.md runs//tests/api_schema.json runs//tests/test_result.json ``` ## What this version does not do yet - It does not run Pi yet. - It does not analyze model cards yet. - It does not configure ZeroGPU yet. - It does not publish anything publicly. - It does not overwrite existing target Spaces. Those are intentionally left for the next increments once OAuth → Jobs → Bucket → private Space creation → live API validation is confirmed. ## Configuration Default values are in `src/config.py`. You can override them with Space variables: ```bash SPACE_FACTORY_BUCKET_SOURCE=fffiloni/space-factory-runs SPACE_FACTORY_BUCKET_MOUNT=/output SPACE_FACTORY_JOB_FLAVOR=cpu-basic SPACE_FACTORY_JOB_TIMEOUT=15m SPACE_FACTORY_JOB_IMAGE=python:3.12 ``` For Phase 2.1, a 15-minute timeout is usually enough for a tiny Gradio Space. Increase it if Space builds are slow. ## Local notes OAuth injection only works inside a Hugging Face Space with `hf_oauth: true`. For local UI development, the app can render, but launching a Job requires a real OAuth token passed by Gradio in a Space. ## Security posture - No global admin token is required. - The user's OAuth token is used only to launch the Job and is passed to the Job as a secret. - The worker script never prints the token. - The target bucket should remain private. - Phase 2.1 target Spaces are private by default. - Raw traces and future Pi sessions must stay private by default. ## Phase 3 — Pi smoke test - Phase 4 — Pi gist recipe This phase installs `@mariozechner/pi-coding-agent` inside the HF Job, configures Pi with Hugging Face Inference Providers using the OAuth token, and asks Pi to make one safe edit to a generated Gradio app before creating the private target Space. Expected run artifacts: ```text runs//generated/app.py runs//logs/pi_output.txt runs//traces/raw/*.jsonl runs//traces/redacted/*.jsonl runs//tests/test_result.json ``` The target Space remains private by default. Success is only declared after the live Gradio API returns the expected Pi-modified output. ## Phase 4 Phase 4 asks Pi to follow the HF Spaces Agent Quickstart gist and use the `hf` CLI inside an HF Job to create/upload a private Space. The wrapper independently validates the live Gradio API before reporting success. ## V5 Adds Phase 5: model-card analysis for simple Transformers text pipeline models. Recommended first test: `sshleifer/tiny-gpt2`. The Space remains private and success is still gated by wrapper-owned live API validation. ## Phase 6 Adds a no-build runtime recommender Job that analyzes model metadata and writes `runtime_recommendation.json` to the Bucket. ## Phase 8 — LongCat article reproduction Phase 8 attempts an article-style LongCat Space build: private target Space, Pi-guided app adaptation, ZeroGPU first, fixed GPU fallback when explicitly enabled, and live `/health` API validation. Full video generation remains a manual-review step until model-specific runtime validation is complete. ## V8 LongCat robust changes V8 focuses on the issues discovered during the first LongCat runs: - validates a cheap HTTP `GET /health` route before falling back to `gradio_client`; - collects best-effort Space runtime/log diagnostics into the Bucket; - treats `request_space_hardware` as best-effort because OAuth tokens may create/write Spaces but still fail on paid hardware changes; - stops retrying hardware on clear 401/auth failures and marks manual hardware action as required; - uploads the entire Pi workspace recursively, so generated packages such as `longcat_video/` are preserved; - defaults the Pi model field to `Qwen/Qwen3-Coder-Next`, while keeping it editable; - adds an implementation mode: `full-inference-attempt` or `safe-scaffold`. ## V9 LongCat full-inference gate V9 keeps the robust V8 health validation but changes the meaning of success for LongCat-style runs. A Space that only boots and exposes `/health` is no longer treated as a full inference reproduction. Key behavior: - default implementation mode is `full-inference-gated`; - Pi is instructed not to silently replace generation with a docs-only placeholder; - if real inference cannot be wired, Pi should produce `TECHNICAL_BLOCKERS.json`; - the worker writes `inference_gate.json` with a status such as `technical_blocker`, `health_only`, or `full_inference_candidate_health_passed`; - Pi must investigate SDPA, xformers, and HF Kernels flash-attn alternatives before declaring flash-attn a hard blocker; - hardware changes remain best-effort because OAuth tokens may create/write Spaces but fail on paid hardware changes. This phase is designed to distinguish “bootable scaffold” from “functional model reproduction”.