oumoumad commited on
Commit
5d1fc1d
Β·
verified Β·
1 Parent(s): a90594c

initial WIP model card (step 1500)

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: ltx-video-license
4
+ license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE.txt
5
+ base_model: Lightricks/LTX-2.3
6
+ tags:
7
+ - lora
8
+ - ic-lora
9
+ - ltx-video
10
+ - ltx-2.3
11
+ - video-restoration
12
+ - dearchive
13
+ - work-in-progress
14
+ pipeline_tag: video-to-video
15
+ library_name: peft
16
+ ---
17
+
18
+ # dearchive Β· LTX-2.3 IC-LoRA (🚧 WIP β€” checkpoint @ step 1500 / 5000)
19
+
20
+ > **Status:** still training (step 1500 / 5000). This is a partial checkpoint published for early review. The recipe below is final; only the weights advance.
21
+
22
+ An **Image-Conditioning LoRA** for [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) (dev branch, 22B) that turns *archive-style* video into a clean, modern, HD equivalent. The reference (input) is what the user feeds at inference: a real low-res low-bitrate web archive clip, Lanczos-upscaled to model resolution. The model then generates the restored frame.
23
+
24
+ | Aspect | Value |
25
+ |---|---|
26
+ | Base model | `Lightricks/LTX-2.3` (`ltx-2.3-22b-dev.safetensors`) |
27
+ | Strategy | `video_to_video` (IC-LoRA, reference-conditioned) |
28
+ | LoRA rank / alpha | 128 / 128 |
29
+ | Trainable params | 855,638,016 (~0.86 B) |
30
+ | Optimizer | Prodigy (D-Adaptation), `lr=1.0` (init scale), bias-correction + safeguard-warmup |
31
+ | Scheduler | cosine |
32
+ | Mixed precision | bf16 + int8-quanto |
33
+ | Reference downscale | 1 (full res) |
34
+ | Resolution buckets | `960Γ—544Γ—97; 960Γ—544Γ—49` |
35
+ | Steps target | 5000 (this checkpoint = 1500) |
36
+ | Save interval | every 500 steps |
37
+ | Seed | 42 |
38
+
39
+ ## What it learns to undo
40
+
41
+ Real archive YouTube uploads of mid-20th-century broadcast footage (Bruce Lee interviews, Chaplin web rips, etc.) are dominated by **resolution + compression loss**, not silent-era film damage. The training pipeline mirrors that:
42
+
43
+ ```
44
+ clean 1920Γ—1080
45
+ β†’ tonal degrade (B&W via Rec.601 luma, optional family tint, contrast/gamma)
46
+ β†’ capture-Οƒ blur (tier-scaled, simulates lens / multi-gen optical printing)
47
+ β†’ downscale to 360p / 270p / 240p (bilinear)
48
+ β†’ low-bitrate h264 encode @ 60–320 kbps
49
+ β†’ optional re-encode 1–3 generations (compounds compression artifacts)
50
+ β†’ optional hqdn3d denoise (heavy tier only)
51
+ β†’ Lanczos upscale back to 1920Γ—1080 (matches inference-time user upscale)
52
+ ```
53
+
54
+ Three corruption *families* are sampled per pair:
55
+
56
+ | Family | What it matches | Calibration ref |
57
+ |---|---|---|
58
+ | `chain_neutral` | neutral B&W broadcast tier | Bruce Lee Philosophy (yt nzQWYHHqvIw, 640Γ—360 / 62 kbps) |
59
+ | `tint_tape` | cool-green VHS-tape oxidation | Bruce Lee Nunchucks (yt qHe6vhexm6g, 320Γ—240 / 88 kbps) |
60
+ | `tint_sepia` | warm-brown film age fade | Safety Last (1923, sepia mid-tones) |
61
+
62
+ Within each family we randomize: archive resolution tier (`very_low / low / med / high`), bitrate (~60–320 kbps), capture-blur Οƒ, contrast/gamma, noise level, tint intensity, and number of encoding generations. Tape family gets the heavy chain at the heavy tier (smashed BL-Nunchucks-class output); neutral and sepia families use 0.65 / 0.70 multipliers on the capture Οƒ and 0.5Γ— the denoise probability so they preserve the gentler mid-tier character.
63
+
64
+ ## Dataset
65
+
66
+ - **53 source clips** from a curated `outpaint/video` + `Focus_IC/videos` pool, landscape β‰₯720p, β‰₯6 s
67
+ - **3 corrupted variants per source** β†’ **159 pairs total**
68
+ - **8 pairs held out** as a validation set spanning all 3 families Γ— tier combos (151 in training)
69
+ - All target/reference at **1920Γ—1080 16:9** (matching aspects β€” this LoRA does **not** outpaint)
70
+ - Frames: **97 frames @ 24 fps** (4.04 s; LTX-2 requires `n % 8 == 1`)
71
+ - Caption (single, generic): *"A modern, high-resolution video shot in vivid color (or natural monochrome), with sharp detail, clean tonality, and contemporary cinematography."*
72
+
73
+ ## Files
74
+
75
+ | File | Step | Notes |
76
+ |---|---|---|
77
+ | `lora_weights_step_01500.safetensors` | 1500 | partial Β· published for early review |
78
+ | `training_state_step_01500.pt` | 1500 | optimizer state (resume-from) |
79
+
80
+ Subsequent step-2000 / 2500 / … checkpoints will be added as training advances.
81
+
82
+ ## Quick inference
83
+
84
+ ```bash
85
+ git clone https://github.com/Lightricks/LTX-2.git && cd LTX-2 && uv sync
86
+ uv pip install peft
87
+
88
+ uv run python packages/ltx-trainer/scripts/inference.py \
89
+ --checkpoint /path/to/ltx-2.3-22b-dev.safetensors \
90
+ --text-encoder-path /path/to/gemma-3-12b-it-qat-q4_0-unquantized \
91
+ --lora-path /path/to/lora_weights_step_01500.safetensors \
92
+ --reference-video /path/to/your_lanczos_upscaled_archive.mp4 \
93
+ --prompt "A modern, high-resolution video shot in vivid color, sharp detail, contemporary cinematography." \
94
+ --width 960 --height 544 --num-frames 97 --frame-rate 24 \
95
+ --num-inference-steps 50 --guidance-scale 4.0 \
96
+ --output dearchive_restored.mp4
97
+ ```
98
+
99
+ ## Caveats β€” please read
100
+
101
+ - **WIP**: the loss has not converged yet. Expect artifacts, weak temporal coherence, and incomplete colorization. The 5000-step run will land here when complete.
102
+ - **Aspect**: target/ref pairs are 16:9. Real 4:3 archive footage should be padded or cropped to 16:9 by the caller β€” this LoRA does not learn to fill side bars.
103
+ - **First step is silent**: the LTX trainer doesn't emit per-step logs (tqdm-buffered). The first observable signal is the step-500 checkpoint save (~40 min on H100 SXM, ~30 min on H200).
104
+ - **Validation**: validation samples were not generated mid-training (config schema confusion); the held-out set will be used for batch inference at the end of the 5000-step run.
105
+
106
+ ---
107
+ *Trained on a Vast.ai H200 (kernel 6.8, driver 580.126.20). Total compute budget ~$35.*