Spaces:

ltx-community
/

ltx2-lora-trainer

Running

App Files Files Community

linoyts HF Staff commited on 7 days ago

Commit

b325d81

verified ·

1 Parent(s): dc0ad54

OAuth login (no pasted tokens) + ai-toolkit-style UI + generic IC-LoRA (paired references, drop Canny)

Browse files

Files changed (4) hide show

README.md +29 -13
app.py +132 -61
jobs.py +51 -22
requirements.txt +1 -1

README.md CHANGED Viewed

@@ -8,6 +8,12 @@ sdk_version: 6.18.0
 app_file: app.py
 pinned: false
 hardware: cpu-basic
 ---
 # LTX-2 LoRA Trainer (HF Jobs)
@@ -15,9 +21,11 @@ hardware: cpu-basic
 Train a **LoRA / IC-LoRA for LTX-2.3** from your own videos, entirely on Hugging Face
 infrastructure:
 - the **Space** (this app, `cpu-basic`) collects your videos + hyperparameters and submits a job;
-- training runs on **HF Jobs** (`a100-large`), reproducing the trainer environment from the
-  monorepo lockfile (`uv sync --frozen`);
 - your dataset is staged on **HF buckets** (`hf://buckets/<you>/ltx2-train-<run>`);
 - the trained LoRA is pushed to the Hub model repo you choose.
@@ -25,21 +33,29 @@ You only pay for the Job's actual GPU runtime.
 ## Setup
-1. **Space secret `HF_TOKEN`** — a write token (Settings → Variables and secrets). Used to
-   create buckets, submit the job, download the gated Gemma encoder, and push the LoRA.
-2. The job pulls the trainer source from the bucket **`LTX_SRC_BUCKET`**
-   (default `linoyts/ltx2-trainer-src`). To use your own, upload the monorepo
-   (`pyproject.toml`, `uv.lock`, `packages/`) to a bucket and set the `LTX_SRC_BUCKET`
-   Space variable.
 ## Usage
-1. Upload videos (or a `.zip`) and one caption per line (alphabetical by filename).
-2. Pick a mode (IC-LoRA Canny / T2V / I2V), set hyperparameters, and a Hub model id.
-3. **Submit training job** → copy the job id into the monitor box and **Refresh** to stream logs.
-The job: sync buckets → `uv sync` → download base checkpoint + Gemma → (Canny references) →
-`process_dataset.py` → `train.py` → push LoRA to the Hub.
 ## Notes

 app_file: app.py
 pinned: false
 hardware: cpu-basic
+hf_oauth: true
+hf_oauth_scopes:
+  - read-repos
+  - write-repos
+  - manage-repos
+  - jobs
 ---
 # LTX-2 LoRA Trainer (HF Jobs)
 Train a **LoRA / IC-LoRA for LTX-2.3** from your own videos, entirely on Hugging Face
 infrastructure:
+- **Sign in with Hugging Face** — the dataset, the job, and the pushed LoRA all run under
+  **your** account and billing (no pasted tokens);
 - the **Space** (this app, `cpu-basic`) collects your videos + hyperparameters and submits a job;
+- training runs on **HF Jobs**, reproducing the trainer environment from the monorepo
+  lockfile (`uv sync --frozen`);
 - your dataset is staged on **HF buckets** (`hf://buckets/<you>/ltx2-train-<run>`);
 - the trained LoRA is pushed to the Hub model repo you choose.
 ## Setup
+Just **sign in with Hugging Face** in the app — the OAuth token (scopes: repos + `jobs`) is
+used to create buckets, submit the job, download the gated Gemma encoder, and push the LoRA,
+all as the signed-in user. No Space secret is required.
+The job pulls the trainer source from the bucket **`LTX_SRC_BUCKET`**
+(default `linoyts/ltx2-trainer-src`). To use your own, upload the monorepo
+(`pyproject.toml`, `uv.lock`, `packages/`) to a bucket and set the `LTX_SRC_BUCKET`
+Space variable.
 ## Usage
+1. Sign in with Hugging Face.
+2. Pick a mode and upload your videos:
+   - **IC-LoRA (in-context control):** upload each target `clip.mp4` **and** its control video
+     `clip_reference.mp4` (depth, pose, edges, an inpainting-masked version, …) — matched by
+     filename. References are user-supplied; nothing is auto-derived.
+   - **T2V / I2V:** upload only the target clips.
+3. Add one caption per line (in filename order of the target clips), set hyperparameters and a
+   Hub model id, then **Submit training job**.
+4. Copy the job id into the monitor box and **Refresh** to stream logs.
+The job: sync buckets → `uv sync` → download base checkpoint + Gemma → `process_dataset.py`
+→ `train.py` → push LoRA to the Hub.
 ## Notes

app.py CHANGED Viewed

@@ -1,41 +1,66 @@
 """LTX-2 LoRA Trainer — HF Space (HF Jobs + buckets).
-Upload videos, set hyperparameters, and submit a training job to HF Jobs. The job runs
-on a100-large, reproduces the trainer env from the lockfile, trains a LoRA/IC-LoRA for
-LTX-2.3, and pushes it to your Hub repo. Datasets are staged on HF buckets.
 Runs on `cpu-basic` — the Space only submits + monitors jobs (no GPU, no torch here).
 """
 from __future__ import annotations
-import os
 import gradio as gr
 import jobs
-HEADER = """# 🎬 LTX-2 LoRA Trainer (HF Jobs)
-Train a LoRA / IC-LoRA for **LTX-2.3** from your own videos — training runs on **HF Jobs**
-(a100-large), data is staged on **HF buckets**, and the trained LoRA is pushed to your Hub repo.
-You only pay for the Job's actual runtime. Set `HF_TOKEN` as a Space secret (or paste a token below).
-"""
 FLAVORS = ["a100-large", "a100x4", "l40sx1", "l40sx4"]
 MAX_LOG = 60_000
-def _extract_id(link_md: str) -> str:
-    import re
-    m = re.search(r"\[([^\]]+)\]\(http", link_md or "")
-    return m.group(1) if m else ""
 def submit_job(
     files, captions_text, run_name, mode, resolution, rank, alpha, lr, steps, batch_size,
-    grad_accum, quantization, optimizer_type, te_8bit, gen_canny, push, hub_id, hf_token, flavor, timeout,
 ):
     if not files:
         return "❌ Upload at least one video.", "", ""
     try:
@@ -43,7 +68,7 @@ def submit_job(
     except ValueError as e:
         return f"❌ {e}", "", ""
     if push and not hub_id.strip():
-        return "❌ Set a Hub model id (username/my-lora) or disable push.", "", ""
     params = {
         "run_name": run_name or "ltx2-lora", "mode": mode, "resolution": resolution.strip(),
@@ -51,8 +76,8 @@ def submit_job(
         "gradient_accumulation_steps": grad_accum,
         "quantization": None if quantization in ("none", None) else quantization,
         "optimizer_type": optimizer_type, "load_text_encoder_in_8bit": bool(te_8bit),
-        "generate_canny_reference": bool(gen_canny), "push_to_hub": bool(push),
-        "hub_model_id": hub_id.strip(), "hf_token": (hf_token or os.environ.get("HF_TOKEN", "")).strip(),
         "captions": captions_text.splitlines() if captions_text else [], "seed": 42,
     }
     try:
@@ -60,7 +85,7 @@ def submit_job(
     except Exception as e:  # noqa: BLE001
         return f"❌ Submission failed: {e}", "", ""
-    status = f"✅ Job submitted on **{flavor}**."
     link = ""
     if res["url"]:
         link = f"**Job:** [{res['job_id']}]({res['url']})  \n**Bucket:** `{res['bucket']}`"
@@ -69,67 +94,113 @@ def submit_job(
     return status, link, res["log"]
-def refresh(job_id, hf_token):
     if not job_id.strip():
         return "Enter a job id.", ""
-    token = (hf_token or os.environ.get("HF_TOKEN", "")).strip()
     st = jobs.job_status(job_id.strip(), token)
     logs = jobs.job_logs(job_id.strip(), token)
-    return f"**Status:** {st}", logs[-MAX_LOG:] if len(logs) > MAX_LOG else logs
-with gr.Blocks(title="LTX-2 LoRA Trainer (HF Jobs)") as demo:
-    gr.Markdown(HEADER)
     with gr.Row():
-        with gr.Column(scale=1):
-            run_name = gr.Textbox(label="Run name", value="ltx2-ic-lora")
-            mode = gr.Dropdown(list(jobs.MODES.keys()), value=list(jobs.MODES.keys())[0], label="Training mode")
-            files = gr.File(label="Videos (or a .zip)", file_count="multiple", file_types=["video", ".zip"])
-            captions = gr.Textbox(label="Captions (one per line, alphabetical by filename)", lines=4)
-            gen_canny = gr.Checkbox(label="Auto-generate Canny references (IC-LoRA)", value=True)
-            with gr.Accordion("Hyperparameters", open=True):
-                resolution = gr.Textbox(label="Resolution WxHxF", value="512x320x25")
                 with gr.Row():
                     rank = gr.Number(label="LoRA rank", value=32, precision=0)
                     alpha = gr.Number(label="LoRA alpha", value=32, precision=0)
                 with gr.Row():
                     lr = gr.Number(label="Learning rate", value=2e-4)
                     steps = gr.Number(label="Steps", value=2000, precision=0)
-                with gr.Row():
-                    batch_size = gr.Number(label="Batch size", value=1, precision=0)
-                    grad_accum = gr.Number(label="Grad accumulation", value=1, precision=0)
-                with gr.Row():
-                    quantization = gr.Dropdown(["none", "int8-quanto", "fp8-quanto"], value="none",
-                                               label="Quantization", info="a100-large (80GB) fits 22B in bf16.")
-                    optimizer_type = gr.Dropdown(["adamw", "adamw8bit"], value="adamw", label="Optimizer")
-                te_8bit = gr.Checkbox(label="Load text encoder in 8-bit", value=False)
-            with gr.Accordion("Hub & hardware", open=True):
-                push = gr.Checkbox(label="Push trained LoRA to the Hub", value=True)
-                hub_id = gr.Textbox(label="Hub model id", value="linoyts/ltx2-ic-lora", placeholder="username/my-lora")
-                hf_token = gr.Textbox(label="HF token (blank → Space secret HF_TOKEN)", type="password")
                 with gr.Row():
                     flavor = gr.Dropdown(FLAVORS, value=jobs.DEFAULT_FLAVOR, label="GPU flavor")
                     timeout = gr.Textbox(label="Timeout", value="4h")
-            submit_btn = gr.Button("🚀 Submit training job", variant="primary")
-        with gr.Column(scale=1):
-            status = gr.Markdown("Ready.")
-            joblink = gr.Markdown("")
-            sublog = gr.Textbox(label="Submission output", lines=6)
-            gr.Markdown("### Monitor a job")
-            with gr.Row():
-                job_id = gr.Textbox(label="Job id", scale=3)
-                refresh_btn = gr.Button("🔄 Refresh", scale=1)
-            mon_status = gr.Markdown("")
-            mon_logs = gr.Textbox(label="Job logs", lines=18, autoscroll=True)
     submit_btn.click(
         submit_job,
         inputs=[files, captions, run_name, mode, resolution, rank, alpha, lr, steps, batch_size,
-                grad_accum, quantization, optimizer_type, te_8bit, gen_canny, push, hub_id, hf_token, flavor, timeout],
         outputs=[status, joblink, sublog],
-    ).then(lambda link: _extract_id(link), inputs=joblink, outputs=job_id)
-    refresh_btn.click(refresh, inputs=[job_id, hf_token], outputs=[mon_status, mon_logs])
 if __name__ == "__main__":
-    demo.queue(default_concurrency_limit=2).launch()

 """LTX-2 LoRA Trainer — HF Space (HF Jobs + buckets).
+Sign in with Hugging Face, upload videos + captions, set hyperparameters, and submit a
+training job to HF Jobs. The job runs on a GPU flavor, reproduces the trainer env from the
+lockfile, trains a LoRA / IC-LoRA for LTX-2.3, and pushes it to your Hub repo. Datasets are
+staged on HF buckets. Everything runs under the *signed-in user's* account — no pasted tokens.
 Runs on `cpu-basic` — the Space only submits + monitors jobs (no GPU, no torch here).
 """
 from __future__ import annotations
+import re
 import gradio as gr
 import jobs
 FLAVORS = ["a100-large", "a100x4", "l40sx1", "l40sx4"]
 MAX_LOG = 60_000
+MODE_KEYS = list(jobs.MODES.keys())
+CSS = """
+#hdr {text-align:left; padding:4px 2px 0 2px;}
+#hdr h1 {margin:0; font-size:1.7rem;}
+#hdr p {margin:.25rem 0 0 0; color:var(--body-text-color-subdued);}
+.section-card {border:1px solid var(--block-border-color); border-radius:14px;
+               padding:14px 16px; background:var(--block-background-fill);}
+.step-badge {font-weight:600; color:var(--primary-500);}
+#banner {border-radius:12px; padding:10px 14px; font-size:.95rem;}
+footer {visibility:hidden;}
+"""
+THEME = gr.themes.Soft(
+    primary_hue=gr.themes.colors.indigo,
+    secondary_hue=gr.themes.colors.purple,
+    radius_size=gr.themes.sizes.radius_lg,
+)
+def _signin_state(profile: gr.OAuthProfile | None):
+    """Banner + default hub id, recomputed on load and after sign-in."""
+    if profile is None:
+        return (
+            "🔒 **You're not signed in.** Use **Sign in with Hugging Face** (top-right) to start — "
+            "your dataset, the training job, and the resulting LoRA all run under **your** account "
+            "and billing. No tokens to paste.",
+            gr.update(),
+        )
+    return (
+        f"✅ Signed in as **{profile.username}** — jobs, buckets and the pushed LoRA will live under "
+        f"your account.",
+        gr.update(value=f"{profile.username}/ltx2-lora"),
+    )
 def submit_job(
     files, captions_text, run_name, mode, resolution, rank, alpha, lr, steps, batch_size,
+    grad_accum, quantization, optimizer_type, te_8bit, push, hub_id, flavor, timeout,
+    profile: gr.OAuthProfile | None, oauth_token: gr.OAuthToken | None,
 ):
+    if oauth_token is None or profile is None:
+        return "❌ Please **sign in with Hugging Face** first (top-right).", "", ""
     if not files:
         return "❌ Upload at least one video.", "", ""
     try:
     except ValueError as e:
         return f"❌ {e}", "", ""
     if push and not hub_id.strip():
+        return "❌ Set a Hub model id (e.g. you/my-lora) or disable push.", "", ""
     params = {
         "run_name": run_name or "ltx2-lora", "mode": mode, "resolution": resolution.strip(),
         "gradient_accumulation_steps": grad_accum,
         "quantization": None if quantization in ("none", None) else quantization,
         "optimizer_type": optimizer_type, "load_text_encoder_in_8bit": bool(te_8bit),
+        "push_to_hub": bool(push), "hub_model_id": hub_id.strip(),
+        "hf_token": oauth_token.token,
         "captions": captions_text.splitlines() if captions_text else [], "seed": 42,
     }
     try:
     except Exception as e:  # noqa: BLE001
         return f"❌ Submission failed: {e}", "", ""
+    status = f"✅ Job submitted on **{flavor}**, running as **{profile.username}**."
     link = ""
     if res["url"]:
         link = f"**Job:** [{res['job_id']}]({res['url']})  \n**Bucket:** `{res['bucket']}`"
     return status, link, res["log"]
+def refresh(job_id, oauth_token: gr.OAuthToken | None):
     if not job_id.strip():
         return "Enter a job id.", ""
+    token = oauth_token.token if oauth_token else ""
     st = jobs.job_status(job_id.strip(), token)
     logs = jobs.job_logs(job_id.strip(), token)
+    return f"**Status:** `{st}`", logs[-MAX_LOG:] if len(logs) > MAX_LOG else logs
+def _extract_id(link_md: str) -> str:
+    m = re.search(r"\[([^\]]+)\]\(http", link_md or "")
+    return m.group(1) if m else ""
+MODE_HELP = (
+    "**IC-LoRA (in-context control)** learns from *pairs*: a target video `clip.mp4` and its "
+    "control/reference video `clip_reference.mp4` (depth, pose, edges, an inpainting-masked "
+    "version, …). Upload both — they're matched by filename. **T2V / I2V** need only the target clips."
+)
+with gr.Blocks(title="LTX-2 LoRA Trainer") as demo:
+    with gr.Row(equal_height=True):
+        gr.Markdown(
+            "# 🎬 LTX-2 LoRA Trainer\n"
+            "Train a **LoRA / IC-LoRA for LTX-2.3** on your own videos. Training runs on "
+            "**HF Jobs**, data is staged on **HF buckets**, and the LoRA is pushed to your Hub — "
+            "you only pay for the GPU runtime.",
+            elem_id="hdr",
+        )
+        gr.LoginButton(scale=0, min_width=200)
+    banner = gr.Markdown(elem_id="banner")
     with gr.Row():
+        # ---------------------------------------------------------------- left: data + training
+        with gr.Column(scale=3):
+            with gr.Group():
+                gr.Markdown("<span class='step-badge'>STEP 1</span> &nbsp; **Dataset**")
+                mode = gr.Dropdown(MODE_KEYS, value=MODE_KEYS[0], label="Training mode")
+                gr.Markdown(MODE_HELP)
+                files = gr.File(
+                    label="Videos — targets + (for IC-LoRA) their *_reference videos, or a .zip",
+                    file_count="multiple", file_types=["video", ".zip"], height=160,
+                )
+                captions = gr.Textbox(
+                    label="Captions — one per line, in filename order of the target clips",
+                    lines=4, placeholder="a fluffy cat in a sunlit room\na sweeping green landscape\n…",
+                )
+            with gr.Group():
+                gr.Markdown("<span class='step-badge'>STEP 2</span> &nbsp; **Training settings**")
+                resolution = gr.Textbox(
+                    label="Resolution (W×H×F)", value="768x512x49",
+                    info="W,H divisible by 32 · frames F satisfy F % 8 == 1 (25, 49, 81, …)",
+                )
                 with gr.Row():
                     rank = gr.Number(label="LoRA rank", value=32, precision=0)
                     alpha = gr.Number(label="LoRA alpha", value=32, precision=0)
                 with gr.Row():
                     lr = gr.Number(label="Learning rate", value=2e-4)
                     steps = gr.Number(label="Steps", value=2000, precision=0)
+                with gr.Accordion("Advanced", open=False):
+                    with gr.Row():
+                        batch_size = gr.Number(label="Batch size", value=1, precision=0)
+                        grad_accum = gr.Number(label="Grad accumulation", value=1, precision=0)
+                    with gr.Row():
+                        quantization = gr.Dropdown(
+                            ["none", "int8-quanto", "fp8-quanto"], value="none", label="Quantization",
+                            info="a100-large (80 GB) fits 22B in bf16 — quantize only on smaller GPUs.",
+                        )
+                        optimizer_type = gr.Dropdown(["adamw", "adamw8bit"], value="adamw", label="Optimizer")
+                    te_8bit = gr.Checkbox(label="Load text encoder in 8-bit", value=False)
+        # ---------------------------------------------------------------- right: output + submit + monitor
+        with gr.Column(scale=2):
+            with gr.Group():
+                gr.Markdown("<span class='step-badge'>STEP 3</span> &nbsp; **Output & launch**")
+                run_name = gr.Textbox(label="Run name", value="ltx2-ic-lora")
+                push = gr.Checkbox(label="Push the trained LoRA to my Hub", value=True)
+                hub_id = gr.Textbox(label="Hub model id", placeholder="username/my-lora")
                 with gr.Row():
                     flavor = gr.Dropdown(FLAVORS, value=jobs.DEFAULT_FLAVOR, label="GPU flavor")
                     timeout = gr.Textbox(label="Timeout", value="4h")
+                submit_btn = gr.Button("🚀 Submit training job", variant="primary", size="lg")
+                status = gr.Markdown("")
+                joblink = gr.Markdown("")
+            with gr.Group():
+                gr.Markdown("**Monitor a job**")
+                with gr.Row():
+                    job_id = gr.Textbox(label="Job id", scale=3)
+                    refresh_btn = gr.Button("🔄 Refresh", scale=1)
+                mon_status = gr.Markdown("")
+                mon_logs = gr.Textbox(label="Job logs", lines=16, autoscroll=True, max_lines=16)
+    sublog = gr.Textbox(label="Submission output", lines=4, visible=False)
+    demo.load(_signin_state, inputs=None, outputs=[banner, hub_id])
     submit_btn.click(
         submit_job,
         inputs=[files, captions, run_name, mode, resolution, rank, alpha, lr, steps, batch_size,
+                grad_accum, quantization, optimizer_type, te_8bit, push, hub_id, flavor, timeout],
         outputs=[status, joblink, sublog],
+    ).then(_extract_id, inputs=joblink, outputs=job_id)
+    refresh_btn.click(refresh, inputs=[job_id], outputs=[mon_status, mon_logs])
 if __name__ == "__main__":
+    demo.queue(default_concurrency_limit=2).launch(theme=THEME, css=CSS)

jobs.py CHANGED Viewed

@@ -8,8 +8,9 @@ Per training request:
 On the Job, the script syncs the source bucket + run bucket, runs `uv sync --frozen`
 (reproducing the working trainer env from the lockfile), downloads the base checkpoint
-and Gemma, then runs compute_reference (Canny) → process_dataset.py → train.py, which
-pushes the trained LoRA to the Hub.
 The Space itself only needs gradio + huggingface_hub + pyyaml (no torch).
 """
@@ -38,7 +39,7 @@ JOB_GEMMA = f"{JOB_ROOT}/gemma"
 JOB_RUN = f"{JOB_ROOT}/run"
 MODES = {
-    "IC-LoRA (video→video, Canny control)": {
         "needs_reference": True,
         "target_modules": [
             "attn1.to_k", "attn1.to_q", "attn1.to_v", "attn1.to_out.0",
@@ -80,6 +81,47 @@ def _collect_videos(uploaded: list[str], dest: Path) -> list[Path]:
     return sorted(p for p in dest.glob("*") if p.suffix.lower() in VIDEO_EXTS)
 def build_config_dict(params: dict, videos: list[Path]) -> dict:
     w, h, f = parse_resolution(params["resolution"])
     mode_cfg = MODES[params["mode"]]
@@ -182,14 +224,6 @@ def main():
         sh(["uv", "run", "python", *args], cwd=tr)
     ds_json = RUN / "dataset" / "dataset.json"
-    if jc.get("needs_reference") and jc.get("generate_canny_reference"):
-        print("=== 5a/6 generate Canny reference videos ===", flush=True)
-        uvrun(["scripts/compute_reference.py", str(RUN / "dataset"), "--output", str(ds_json)])
-        items = json.loads(ds_json.read_text())
-        for it in items:
-            if "reference_path" in it:
-                it["reference_video"] = it.pop("reference_path")
-        ds_json.write_text(json.dumps(items, indent=2))
     print("=== 5/6 preprocess dataset ===", flush=True)
     uvrun(["scripts/process_dataset.py", str(ds_json),
@@ -227,20 +261,15 @@ def submit(params: dict, uploaded_videos: list[str], flavor: str, timeout: str)
         videos = _collect_videos(uploaded_videos, tmp / "dataset" / "videos")
         if not videos:
             raise ValueError("No valid video files in the upload.")
-        # dataset.json (media_path + caption, relative to dataset dir)
-        caps = params.get("captions", [])
-        items = [{"media_path": f"videos/{v.name}",
-                  "caption": caps[i] if i < len(caps) and caps[i].strip() else "a video"}
-                 for i, v in enumerate(videos)]
         (tmp / "dataset" / "dataset.json").write_text(json.dumps(items, indent=2))
-        # config.yaml + job_config.json
-        cfg = build_config_dict(params, videos)
         (tmp / "config.yaml").write_text(yaml.safe_dump(cfg, sort_keys=False))
-        job_cfg = {"resolution": params["resolution"],
-                   "needs_reference": MODES[params["mode"]]["needs_reference"],
-                   "generate_canny_reference": bool(params.get("generate_canny_reference", True))}
         # job script reads job_config.json + config.yaml at the run root (sibling of dataset/)
-        (tmp / "job_config.json").write_text(json.dumps(job_cfg, indent=2))
         env = os.environ.copy()
         if token:

 On the Job, the script syncs the source bucket + run bucket, runs `uv sync --frozen`
 (reproducing the working trainer env from the lockfile), downloads the base checkpoint
+and Gemma, then runs process_dataset.py → train.py, which pushes the trained LoRA to the
+Hub. For IC-LoRA, references are user-supplied (paired `*_reference` videos) — no
+auto-derivation.
 The Space itself only needs gradio + huggingface_hub + pyyaml (no torch).
 """
 JOB_RUN = f"{JOB_ROOT}/run"
 MODES = {
+    "IC-LoRA (in-context control)": {
         "needs_reference": True,
         "target_modules": [
             "attn1.to_k", "attn1.to_q", "attn1.to_v", "attn1.to_out.0",
     return sorted(p for p in dest.glob("*") if p.suffix.lower() in VIDEO_EXTS)
+def _is_reference(p: Path) -> bool:
+    return p.stem.endswith("_reference")
+def build_dataset_items(
+    videos: list[Path], captions: list[str], needs_reference: bool
+) -> tuple[list[dict], list[Path]]:
+    """Build dataset.json rows. For IC-LoRA, pair each target `X.ext` with `X_reference.ext`
+    (user-supplied — no auto-derivation). Captions align to the sorted target clips.
+    Returns (items, targets). Raises ValueError on missing references or no targets.
+    """
+    vids = [v for v in videos if v.suffix.lower() in VIDEO_EXTS]
+    if needs_reference:
+        targets = sorted(v for v in vids if not _is_reference(v))
+        refs = {v.stem[: -len("_reference")]: v for v in vids if _is_reference(v)}
+    else:
+        targets, refs = sorted(vids), {}
+    items, missing = [], []
+    for i, v in enumerate(targets):
+        cap = captions[i] if i < len(captions) and captions[i].strip() else "a video"
+        row = {"media_path": f"videos/{v.name}", "caption": cap}
+        if needs_reference:
+            ref = refs.get(v.stem)
+            if ref is None:
+                missing.append(v.name)
+                continue
+            row["reference_video"] = f"videos/{ref.name}"
+        items.append(row)
+    if needs_reference and missing:
+        raise ValueError(
+            "Missing reference video(s) for: " + ", ".join(missing)
+            + ". For IC-LoRA, every target `X.mp4` needs a paired `X_reference.mp4`."
+        )
+    if not items:
+        raise ValueError("No target videos found in the upload.")
+    return items, targets
 def build_config_dict(params: dict, videos: list[Path]) -> dict:
     w, h, f = parse_resolution(params["resolution"])
     mode_cfg = MODES[params["mode"]]
         sh(["uv", "run", "python", *args], cwd=tr)
     ds_json = RUN / "dataset" / "dataset.json"
     print("=== 5/6 preprocess dataset ===", flush=True)
     uvrun(["scripts/process_dataset.py", str(ds_json),
         videos = _collect_videos(uploaded_videos, tmp / "dataset" / "videos")
         if not videos:
             raise ValueError("No valid video files in the upload.")
+        # dataset.json — pairs targets with user-supplied references for IC-LoRA
+        needs_reference = MODES[params["mode"]]["needs_reference"]
+        items, targets = build_dataset_items(videos, params.get("captions", []), needs_reference)
         (tmp / "dataset" / "dataset.json").write_text(json.dumps(items, indent=2))
+        # config.yaml + job_config.json (validation sample references the first target's reference)
+        cfg = build_config_dict(params, targets)
         (tmp / "config.yaml").write_text(yaml.safe_dump(cfg, sort_keys=False))
         # job script reads job_config.json + config.yaml at the run root (sibling of dataset/)
+        (tmp / "job_config.json").write_text(json.dumps({"resolution": params["resolution"]}, indent=2))
         env = os.environ.copy()
         if token:

requirements.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-gradio>=6.18
 huggingface_hub[hf-xet]>=1.5
 pyyaml
 hf_transfer

+gradio[oauth]>=6.18
 huggingface_hub[hf-xet]>=1.5
 pyyaml
 hf_transfer