Hugging Face's logo
on1onmangoes
/
Agents
Runtime error

runtime error

Exit code: 1. Reason: 4 05:49:57,421 INFO Audio-only mode: freed video components, saved 0.0GB VRAM 2026-05-14 05:49:57,421 INFO PromptEncoder (warm): 32.2s 2026-05-14 05:49:57,475 INFO AudioConditioner (warm): 0.1s Traceback (most recent call last): File "/app/app.py", line 32, in <module> tts = TTSServer( checkpoint=PATHS["transformer"], ...<5 lines>... bnb_4bit=True, # unsloth Gemma is pre-quantized ) File "/app/src/inference_server.py", line 116, in __init__ self._load_all() ~~~~~~~~~~~~~~^^ File "/app/src/inference_server.py", line 185, in _load_all self._velocity_model = builder.build(device=self.device, dtype=self.dtype).to(self.device).eval() ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/ltx2/ltx_core/loader/single_gpu_model_builder.py", line 111, in build model_state_dict = self.load_sd(model_paths, sd_ops=self.model_sd_ops, registry=self.registry, device=device) File "/app/ltx2/ltx_core/loader/single_gpu_model_builder.py", line 88, in load_sd state_dict = self.model_loader.load(paths, sd_ops=sd_ops, device=device) File "/app/ltx2/ltx_core/loader/sft_loader.py", line 66, in load return self.weight_loader.load(path, sd_ops, device) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^ File "/app/ltx2/ltx_core/loader/sft_loader.py", line 36, in load value = f.get_tensor(name).to(device=device, non_blocking=True, copy=False) ~~~~~~~~~~~~^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 16.19 MiB is free. Process 540318 has 14.72 GiB memory in use. Of the allocated memory 13.91 GiB is allocated by PyTorch, and 729.25 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Container logs:

Fetching error logs...