Instructions to use oumoumad/ltx-2.3-dearchive-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use oumoumad/ltx-2.3-dearchive-lora with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
initial WIP model card (step 1500)
Browse files
README.md
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: ltx-video-license
|
| 4 |
+
license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE.txt
|
| 5 |
+
base_model: Lightricks/LTX-2.3
|
| 6 |
+
tags:
|
| 7 |
+
- lora
|
| 8 |
+
- ic-lora
|
| 9 |
+
- ltx-video
|
| 10 |
+
- ltx-2.3
|
| 11 |
+
- video-restoration
|
| 12 |
+
- dearchive
|
| 13 |
+
- work-in-progress
|
| 14 |
+
pipeline_tag: video-to-video
|
| 15 |
+
library_name: peft
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# dearchive Β· LTX-2.3 IC-LoRA (π§ WIP β checkpoint @ step 1500 / 5000)
|
| 19 |
+
|
| 20 |
+
> **Status:** still training (step 1500 / 5000). This is a partial checkpoint published for early review. The recipe below is final; only the weights advance.
|
| 21 |
+
|
| 22 |
+
An **Image-Conditioning LoRA** for [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) (dev branch, 22B) that turns *archive-style* video into a clean, modern, HD equivalent. The reference (input) is what the user feeds at inference: a real low-res low-bitrate web archive clip, Lanczos-upscaled to model resolution. The model then generates the restored frame.
|
| 23 |
+
|
| 24 |
+
| Aspect | Value |
|
| 25 |
+
|---|---|
|
| 26 |
+
| Base model | `Lightricks/LTX-2.3` (`ltx-2.3-22b-dev.safetensors`) |
|
| 27 |
+
| Strategy | `video_to_video` (IC-LoRA, reference-conditioned) |
|
| 28 |
+
| LoRA rank / alpha | 128 / 128 |
|
| 29 |
+
| Trainable params | 855,638,016 (~0.86 B) |
|
| 30 |
+
| Optimizer | Prodigy (D-Adaptation), `lr=1.0` (init scale), bias-correction + safeguard-warmup |
|
| 31 |
+
| Scheduler | cosine |
|
| 32 |
+
| Mixed precision | bf16 + int8-quanto |
|
| 33 |
+
| Reference downscale | 1 (full res) |
|
| 34 |
+
| Resolution buckets | `960Γ544Γ97; 960Γ544Γ49` |
|
| 35 |
+
| Steps target | 5000 (this checkpoint = 1500) |
|
| 36 |
+
| Save interval | every 500 steps |
|
| 37 |
+
| Seed | 42 |
|
| 38 |
+
|
| 39 |
+
## What it learns to undo
|
| 40 |
+
|
| 41 |
+
Real archive YouTube uploads of mid-20th-century broadcast footage (Bruce Lee interviews, Chaplin web rips, etc.) are dominated by **resolution + compression loss**, not silent-era film damage. The training pipeline mirrors that:
|
| 42 |
+
|
| 43 |
+
```
|
| 44 |
+
clean 1920Γ1080
|
| 45 |
+
β tonal degrade (B&W via Rec.601 luma, optional family tint, contrast/gamma)
|
| 46 |
+
β capture-Ο blur (tier-scaled, simulates lens / multi-gen optical printing)
|
| 47 |
+
β downscale to 360p / 270p / 240p (bilinear)
|
| 48 |
+
β low-bitrate h264 encode @ 60β320 kbps
|
| 49 |
+
β optional re-encode 1β3 generations (compounds compression artifacts)
|
| 50 |
+
β optional hqdn3d denoise (heavy tier only)
|
| 51 |
+
β Lanczos upscale back to 1920Γ1080 (matches inference-time user upscale)
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
Three corruption *families* are sampled per pair:
|
| 55 |
+
|
| 56 |
+
| Family | What it matches | Calibration ref |
|
| 57 |
+
|---|---|---|
|
| 58 |
+
| `chain_neutral` | neutral B&W broadcast tier | Bruce Lee Philosophy (yt nzQWYHHqvIw, 640Γ360 / 62 kbps) |
|
| 59 |
+
| `tint_tape` | cool-green VHS-tape oxidation | Bruce Lee Nunchucks (yt qHe6vhexm6g, 320Γ240 / 88 kbps) |
|
| 60 |
+
| `tint_sepia` | warm-brown film age fade | Safety Last (1923, sepia mid-tones) |
|
| 61 |
+
|
| 62 |
+
Within each family we randomize: archive resolution tier (`very_low / low / med / high`), bitrate (~60β320 kbps), capture-blur Ο, contrast/gamma, noise level, tint intensity, and number of encoding generations. Tape family gets the heavy chain at the heavy tier (smashed BL-Nunchucks-class output); neutral and sepia families use 0.65 / 0.70 multipliers on the capture Ο and 0.5Γ the denoise probability so they preserve the gentler mid-tier character.
|
| 63 |
+
|
| 64 |
+
## Dataset
|
| 65 |
+
|
| 66 |
+
- **53 source clips** from a curated `outpaint/video` + `Focus_IC/videos` pool, landscape β₯720p, β₯6 s
|
| 67 |
+
- **3 corrupted variants per source** β **159 pairs total**
|
| 68 |
+
- **8 pairs held out** as a validation set spanning all 3 families Γ tier combos (151 in training)
|
| 69 |
+
- All target/reference at **1920Γ1080 16:9** (matching aspects β this LoRA does **not** outpaint)
|
| 70 |
+
- Frames: **97 frames @ 24 fps** (4.04 s; LTX-2 requires `n % 8 == 1`)
|
| 71 |
+
- Caption (single, generic): *"A modern, high-resolution video shot in vivid color (or natural monochrome), with sharp detail, clean tonality, and contemporary cinematography."*
|
| 72 |
+
|
| 73 |
+
## Files
|
| 74 |
+
|
| 75 |
+
| File | Step | Notes |
|
| 76 |
+
|---|---|---|
|
| 77 |
+
| `lora_weights_step_01500.safetensors` | 1500 | partial Β· published for early review |
|
| 78 |
+
| `training_state_step_01500.pt` | 1500 | optimizer state (resume-from) |
|
| 79 |
+
|
| 80 |
+
Subsequent step-2000 / 2500 / β¦ checkpoints will be added as training advances.
|
| 81 |
+
|
| 82 |
+
## Quick inference
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
git clone https://github.com/Lightricks/LTX-2.git && cd LTX-2 && uv sync
|
| 86 |
+
uv pip install peft
|
| 87 |
+
|
| 88 |
+
uv run python packages/ltx-trainer/scripts/inference.py \
|
| 89 |
+
--checkpoint /path/to/ltx-2.3-22b-dev.safetensors \
|
| 90 |
+
--text-encoder-path /path/to/gemma-3-12b-it-qat-q4_0-unquantized \
|
| 91 |
+
--lora-path /path/to/lora_weights_step_01500.safetensors \
|
| 92 |
+
--reference-video /path/to/your_lanczos_upscaled_archive.mp4 \
|
| 93 |
+
--prompt "A modern, high-resolution video shot in vivid color, sharp detail, contemporary cinematography." \
|
| 94 |
+
--width 960 --height 544 --num-frames 97 --frame-rate 24 \
|
| 95 |
+
--num-inference-steps 50 --guidance-scale 4.0 \
|
| 96 |
+
--output dearchive_restored.mp4
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
## Caveats β please read
|
| 100 |
+
|
| 101 |
+
- **WIP**: the loss has not converged yet. Expect artifacts, weak temporal coherence, and incomplete colorization. The 5000-step run will land here when complete.
|
| 102 |
+
- **Aspect**: target/ref pairs are 16:9. Real 4:3 archive footage should be padded or cropped to 16:9 by the caller β this LoRA does not learn to fill side bars.
|
| 103 |
+
- **First step is silent**: the LTX trainer doesn't emit per-step logs (tqdm-buffered). The first observable signal is the step-500 checkpoint save (~40 min on H100 SXM, ~30 min on H200).
|
| 104 |
+
- **Validation**: validation samples were not generated mid-training (config schema confusion); the held-out set will be used for batch inference at the end of the 5000-step run.
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
*Trained on a Vast.ai H200 (kernel 6.8, driver 580.126.20). Total compute budget ~$35.*
|