oumoumad
/

ltx-2.3-dearchive-lora

+---
+license: other
+license_name: ltx-video-license
+license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE.txt
+base_model: Lightricks/LTX-2.3
+tags:
+  - lora
+  - ic-lora
+  - ltx-video
+  - ltx-2.3
+  - video-restoration
+  - dearchive
+  - work-in-progress
+pipeline_tag: video-to-video
+library_name: peft
+---
+# dearchive · LTX-2.3 IC-LoRA  (🚧 WIP — checkpoint @ step 1500 / 5000)
+> **Status:** still training (step 1500 / 5000). This is a partial checkpoint published for early review. The recipe below is final; only the weights advance.
+An **Image-Conditioning LoRA** for [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) (dev branch, 22B) that turns *archive-style* video into a clean, modern, HD equivalent. The reference (input) is what the user feeds at inference: a real low-res low-bitrate web archive clip, Lanczos-upscaled to model resolution. The model then generates the restored frame.
+| Aspect | Value |
+|---|---|
+| Base model | `Lightricks/LTX-2.3` (`ltx-2.3-22b-dev.safetensors`) |
+| Strategy | `video_to_video` (IC-LoRA, reference-conditioned) |
+| LoRA rank / alpha | 128 / 128 |
+| Trainable params | 855,638,016 (~0.86 B) |
+| Optimizer | Prodigy (D-Adaptation), `lr=1.0` (init scale), bias-correction + safeguard-warmup |
+| Scheduler | cosine |
+| Mixed precision | bf16 + int8-quanto |
+| Reference downscale | 1 (full res) |
+| Resolution buckets | `960×544×97; 960×544×49` |
+| Steps target | 5000 (this checkpoint = 1500) |
+| Save interval | every 500 steps |
+| Seed | 42 |
+## What it learns to undo
+Real archive YouTube uploads of mid-20th-century broadcast footage (Bruce Lee interviews, Chaplin web rips, etc.) are dominated by **resolution + compression loss**, not silent-era film damage. The training pipeline mirrors that:
+```
+clean 1920×1080
+   → tonal degrade  (B&W via Rec.601 luma, optional family tint, contrast/gamma)
+   → capture-σ blur (tier-scaled, simulates lens / multi-gen optical printing)
+   → downscale to 360p / 270p / 240p (bilinear)
+   → low-bitrate h264 encode @ 60–320 kbps
+   → optional re-encode 1–3 generations (compounds compression artifacts)
+   → optional hqdn3d denoise (heavy tier only)
+   → Lanczos upscale back to 1920×1080  (matches inference-time user upscale)
+```
+Three corruption *families* are sampled per pair:
+| Family | What it matches | Calibration ref |
+|---|---|---|
+| `chain_neutral` | neutral B&W broadcast tier | Bruce Lee Philosophy (yt nzQWYHHqvIw, 640×360 / 62 kbps) |
+| `tint_tape`    | cool-green VHS-tape oxidation | Bruce Lee Nunchucks (yt qHe6vhexm6g, 320×240 / 88 kbps) |
+| `tint_sepia`   | warm-brown film age fade | Safety Last (1923, sepia mid-tones) |
+Within each family we randomize: archive resolution tier (`very_low / low / med / high`), bitrate (~60–320 kbps), capture-blur σ, contrast/gamma, noise level, tint intensity, and number of encoding generations. Tape family gets the heavy chain at the heavy tier (smashed BL-Nunchucks-class output); neutral and sepia families use 0.65 / 0.70 multipliers on the capture σ and 0.5× the denoise probability so they preserve the gentler mid-tier character.
+## Dataset
+- **53 source clips** from a curated `outpaint/video` + `Focus_IC/videos` pool, landscape ≥720p, ≥6 s
+- **3 corrupted variants per source** → **159 pairs total**
+- **8 pairs held out** as a validation set spanning all 3 families × tier combos (151 in training)
+- All target/reference at **1920×1080 16:9** (matching aspects — this LoRA does **not** outpaint)
+- Frames: **97 frames @ 24 fps** (4.04 s; LTX-2 requires `n % 8 == 1`)
+- Caption (single, generic): *"A modern, high-resolution video shot in vivid color (or natural monochrome), with sharp detail, clean tonality, and contemporary cinematography."*
+## Files
+| File | Step | Notes |
+|---|---|---|
+| `lora_weights_step_01500.safetensors` | 1500 | partial · published for early review |
+| `training_state_step_01500.pt` | 1500 | optimizer state (resume-from) |
+Subsequent step-2000 / 2500 / … checkpoints will be added as training advances.
+## Quick inference
+```bash
+git clone https://github.com/Lightricks/LTX-2.git && cd LTX-2 && uv sync
+uv pip install peft
+uv run python packages/ltx-trainer/scripts/inference.py \
+    --checkpoint /path/to/ltx-2.3-22b-dev.safetensors \
+    --text-encoder-path /path/to/gemma-3-12b-it-qat-q4_0-unquantized \
+    --lora-path /path/to/lora_weights_step_01500.safetensors \
+    --reference-video /path/to/your_lanczos_upscaled_archive.mp4 \
+    --prompt "A modern, high-resolution video shot in vivid color, sharp detail, contemporary cinematography." \
+    --width 960 --height 544 --num-frames 97 --frame-rate 24 \
+    --num-inference-steps 50 --guidance-scale 4.0 \
+    --output dearchive_restored.mp4
+```
+## Caveats — please read
+- **WIP**: the loss has not converged yet. Expect artifacts, weak temporal coherence, and incomplete colorization. The 5000-step run will land here when complete.
+- **Aspect**: target/ref pairs are 16:9. Real 4:3 archive footage should be padded or cropped to 16:9 by the caller — this LoRA does not learn to fill side bars.
+- **First step is silent**: the LTX trainer doesn't emit per-step logs (tqdm-buffered). The first observable signal is the step-500 checkpoint save (~40 min on H100 SXM, ~30 min on H200).
+- **Validation**: validation samples were not generated mid-training (config schema confusion); the held-out set will be used for batch inference at the end of the 5000-step run.
+---
+*Trained on a Vast.ai H200 (kernel 6.8, driver 580.126.20). Total compute budget ~$35.*