File size: 2,951 Bytes
6f5363d
 
9c6bd0f
6f5363d
 
 
9c6bd0f
 
 
 
 
 
6f5363d
 
9c6bd0f
6f5363d
9c6bd0f
6f5363d
9c6bd0f
6f5363d
9c6bd0f
 
6f5363d
9c6bd0f
6f5363d
9c6bd0f
6f5363d
9c6bd0f
 
 
6f5363d
9c6bd0f
6f5363d
9c6bd0f
6f5363d
9c6bd0f
 
 
 
 
6f5363d
9c6bd0f
58f8f62
9c6bd0f
58f8f62
9c6bd0f
 
 
 
 
 
 
 
 
 
 
58f8f62
9c6bd0f
58f8f62
9c6bd0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58f8f62
9c6bd0f
58f8f62
9c6bd0f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Architecture

Agentic Space Factory is a Hugging Face-native implementation of the local agent workflow described in the ZeroGPU Spaces article.

```text
User
  β†’ Gradio orchestrator Space with HF OAuth
  β†’ ephemeral HF Job
  β†’ Pi coding agent + HF Inference Providers model
  β†’ generated private target Space
  β†’ Storage Bucket run record
  β†’ live validation job when hardware is ready
```

## Components

### Orchestrator Space

The public UI has two workflows:

1. **Build from model card** β€” starts an HF Job that analyzes a model card and asks Pi to generate a private Gradio Space.
2. **Validate existing Space** β€” starts a separate HF Job that smoke-tests a generated Space after hardware has been configured, measures latency, and stores the output artifact.

The orchestrator never stores a global admin token. It uses the signed-in user's HF OAuth token.

### HF Jobs

Jobs do the long-running work: installing Pi, generating code, creating/uploading the target Space, checking runtime state, and running live validations.

The builder job is allowed to create a private Space and upload generated files. Hardware assignment is attempted on a best-effort basis only. If ZeroGPU or fixed-GPU assignment fails because of quota, billing, or OAuth limits, the run is marked as requiring manual hardware.

### Pi + coding model

Pi runs inside the Job and uses a model such as `Qwen/Qwen3-Coder-Next` through Hugging Face Inference Providers. It receives a strict goal:

- generate a Gradio app from the model card;
- keep the Space private;
- add `/health` and generation endpoints where possible;
- do not mark placeholders as full inference;
- write blockers if full inference is impossible.

### Storage Bucket

Every run writes to the configured Bucket:

```text
runs/<run_id>/state.json
runs/<run_id>/events.jsonl
runs/<run_id>/report.md
runs/<run_id>/generated/
runs/<run_id>/tests/
runs/<run_id>/artifacts/
runs/<run_id>/traces/
```

### Target Space

Generated Spaces are private by default. The builder attempts ZeroGPU first when selected, then an optional fixed-GPU fallback. If both fail, the Space can still be configured manually in Settings, then validated with the second workflow.

## Result lifecycle

```text
Build from model card
  β†’ generated private Space
  β†’ ZeroGPU/fixed GPU best-effort
  β†’ health/API gate
  β†’ manual_hardware_required or candidate status

Validate existing Space
  β†’ call /generate or configured endpoint
  β†’ verify output type
  β†’ measure latency
  β†’ save artifact
  β†’ full_inference_success when output is valid
```

## Known limits

- Automatic paid hardware assignment through OAuth may fail; manual hardware selection is supported.
- ZeroGPU may be unavailable because of quota or namespace limits.
- Multi-GPU, Docker-only, ComfyUI, custom CUDA/FlashAttention, external API keys, or gated models may require manual intervention or produce `technical_blocker`.