πŸš€ Singularity-LTX-2.3_OmniCine_V1 Official Release

This is not just a standard fine-tune; it is a fundamental restructuring of the LTX-Video (2.3) generation logic.

I am thrilled to present the official release of LTX2.3 Singularity to the community. This comprehensive optimization framework focuses heavily on Image-to-Video (I2V), First & Last Frame Control, and Reference-to-Video generation. Although it has currently undergone only nearly 100,000 steps (calculated by gradient accumulation), its enhancements in physical consistency, dynamic motion, and cinematic expression have already far exceeded expectations.


🌟 Key Improvements

  • 🦴 Limbs & Anatomy Evolution: Specifically optimized to fix the common degradation of fingers and toes, drastically reducing anatomy warping and artifacts during fast movements.
  • 🎬 Injecting Shot Continuity: Achieved precise timeline-based shot and camera cuts controlled directly via text prompts (0-5s logical segments), saying goodbye to erratic, randomized framing.
  • πŸ—£οΈ Elimination of "AI Stiffness": Significantly enhanced facial expressiveness during speech, deeply optimized lip-syncing, and natively eliminated the rigid, burned-in subtitles frequently generated by the base model.
  • βš–οΈ Physical Consistency: Improved the structural integrity of characters and environments during high-speed actions, suppressing chaotic "twisting/morphing" and aligning motions with real-world physics.
  • 🎨 Flawless Anime Compatibility: Integrated a high-quality Anime training dataset, allowing the model to seamlessly adapt across diverse styles including 2D anime, 3D CGI, and hyper-realism.
  • πŸŒͺ️ Extreme Dynamic Range: Delivers stellar performance in high-action sequences like running and combat sports. Simultaneously, visual effects for cyberpunk themes, transformations, magic casting, and monster rendering have been massively amplified.
  • πŸ–ΌοΈ Revolutionary Reference Image Control: Upgraded the "Reference-to-Video" capability. No longer bound to rigid first-frame constraints, the model intelligently extracts character features and artistic styles from the reference image, generating entirely new angles and compositions based on your prompts.

Effect Demonstration

Prompt:8-second video, one-take shot,this high-octane cinematic animation features a young woman skateboarding down a steep coastal road during a bright, sunny afternoon, evoking a sense of freedom and adrenaline. From 0 to 3 seconds, the girl crouches low on her skateboard, her long brown hair streaming behind her as she accelerates down the incline. Between 4 and 8 seconds, she maintains a steady, aerodynamic posture, skillfully navigating the winding asphalt while her oversized white t-shirt billows in the wind, showing intense focus and excitement. The camera utilizes a low-angle tracking shot from directly behind the skater, moving at a high velocity to create an immersive sense of speed; the composition starts with a wide-angle view of the sprawling coastal city and ocean, then maintains a medium-close-up on the girl's back and the board throughout the descent. The setting is a picturesque Mediterranean-style hillside overlooking a turquoise bay under a vibrant blue sky filled with fluffy white clouds, featuring high-contrast sunlight casting sharp shadows and a warm, high-saturation color palette. The soundscape is dominated by the continuous, high-pitched whirring of skateboard wheels against the pavement and the rushing sound of wind, synchronized with an upbeat, rhythmic electronic background track. Significant motion blur is applied to the road surface and passing scenery to enhance the perception of speed, combined with a sharp focus on the central subject to maintain a polished, professional lookοΌŒζ— ε­—εΉ•γ€‚ Prompt:Modern anime style, office setting, weary and playful mood. 0-3s, a young man with messy black hair and sharp features wearing a black button-down shirt yawns deeply, eyes closed in exhaustion. 3-6s, he sighs and covers his face with both hands, rubbing his temples and eyes with a look of intense fatigue. 6-9s, he lowers his hands, looking startled as a female hand rests on his shoulder; he turns his head slightly with wide, tired eyes. 9-12s, a beautiful woman with long black hair and glowing red eyes leans in close to him, wearing a professional suit and a playful, mischievous smile. 12-15s, the man stares back at her with a heavy-lidded, slightly annoyed gaze, while she continues to smile at him. 0-8s, medium shot, eye-level, static camera focused on the man at his desk. 8-15s, dolly in to a tight close-up, focusing on the interaction between the two characters, shallow depth of field blurring the background. Modern office interior, bookshelves with files, laptop and coffee mug on the desk, soft fluorescent lighting, neutral color palette with high contrast on the characters' hair and eyes. 9-13s, ε₯³ε­θ―΄οΌšε°εΈ…ε“₯οΌŒζ€ŽδΉˆθΏ™δΉˆε›°οΌŸ Voice: Playful and soft, Pace: Moderate. Audible yawn, soft rustle of clothing, low-volume office ambience, light and rhythmic background music. High-quality 2D animation, smooth motion, cinematic depth of field, sharp character linesοΌŒζ— ε­—εΉ•γ€‚ Prompt:slow motion video, a woman with long gray hair, wearing a black tank top and leggings, stands in profile against a white background, she transitions into a fighting stance, raising her fists to protect her face, she then pivots on her feet, turning her body away from the camera while keeping her guard up, the sequence ends with her in a wide, stable stance facing to the right, ready for combat or a defensive maneuver Prompt:Cinematic historical epic style, twilight battlefield, somber mood. 0-6η§’οΌŒWounded warrior in armor sits leaning on debris, holding sword, panting, raising hand. 6-10η§’οΌŒView of bloody hand, slight twitch. 0-6η§’οΌŒMedium shot, slight high angle, static. 6-10η§’οΌŒExtreme close-up, shallow focus. Ruined battlefield, firelight, thick smoke, low contrast, cold tone with warm highlights. 0-10η§’οΌŒζ— δΊΊη‰©θ―΄θ―οΌŒVoice: Heavy breathing, Pace: Slow. Crackling fire SFX, dramatic BGM. film grain, cinematic bokehοΌŒζ— ε­—εΉ•


πŸ“Š Performance Showreel

Evaluation Dimension Performance Characteristics
Anatomy & Details Highly stable finger and limb structures with a massive reduction in ghosting/artifacts.
Physical Motion Smooth, fluid transitions adhering naturally to inertia and gravity.
Shot Transitions Flawless cinematic cutting logic with precise timestamp orchestration.
Visual Aesthetics Composition, lighting, and overall cinematic atmospheric depth are heavily enhanced.
Character Consistency Natural facial expressions even in tight close-ups; prevents sudden "face-swapping."
High Dynamic Limits Meets the vast majority of movement demands. Slight motion blur may still occur during extreme, highly complex actions. This is currently being addressed via optimized post-processing workflowsβ€”stay tuned!

βš™οΈ Usage Guide

  • Recommended Base Model: ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors
  • ComfyUI Workflow: Uploaded and available in the files tab of this repository. Highly recommended to use in First & Last Frame Mode for ultimate scene control.
  • Online Demo: Click here to try it online

πŸ“ Exclusive: Singularity Prompting Framework

This model follows a strict prompt structure to unlock its full cinematic potential. Please adhere closely to the "Cinematic Timeline Structure" below.

πŸ’‘ Core Rule: Keep visual descriptions, timestamps, actions, and dialogue strictly formatted in English as shown below.

πŸ“ Output Template Structure

[Scene & Style]: Core visual description in one sentence (e.g., Cinematic wuxia style, dim lighting, Anime, 3D).
[Action Timeline]: 0-X seconds, [action / emotional description].
[Camera Timeline]: 0-X seconds, [camera movement / composition parameters].
[Environment]: Lighting source, contrast, and color grading details.
[Dialogue]: 0-X seconds, [Character] says: "[Dialogue text]".
[Audio & Technical]: Background sounds, film grain, subtitle exclusion commands, etc.

---

#### 🎬 Example Prompt

Cinematic wuxia style, indoor dim lighting, mysterious mood. 0-10 seconds, young man in ancient white robes looks down with a confused expression. 0-10 seconds, tight close-up, static camera with slight handheld movement. Dark stone background, warm candlelight bokeh. 0-10 seconds, man says: "What on earth is this? I've never heard of it before.". Voice: low and confused, Pace: slow. Precise lip-sync, film grain, cinematic bokeh, no subtitles.

---

# Contact Information
WeChat: aigctyd
Email: a592991299@gmail.com

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support