Generate UGC Video With Voice Clone

This ComfyUI tutorial workflow turns a single image into a short, lip‑synced UGC-style video with cloned or selected voice. It chains three stages: prompt creation from your uploaded image, speech synthesis with ElevenLabs, and video generation with LTX‑2.3. The Create Prompt group routes the image from LoadImage into GeminiNode to auto-generate two texts: a performance-ready speech script (with expression tags) and a scene description. RegexExtract parses the Gemini response to separate the “speech” and “scene” sections cleanly for downstream nodes.

For audio, you can pick a preset voice via ElevenLabsVoiceSelector or bring your own tone with ElevenLabsInstantVoiceClone, then synthesize the final narration in ElevenLabsTextToSpeech and optionally save it with SaveAudioMP3. The Create Video group feeds the scene description and the audio into the LTX‑2.3 video node (UUID: 98fb87e2-23b5-4ecb-aacc-365912414a12) to produce a talking-style video with accurate lip sync, previewed by PreviewAny and written with SaveVideo. This setup is practical for rapid UGC, product explainers, testimonials, or social posts because it automates script writing from an image and guarantees voice-to-lips alignment through ElevenLabs + LTX.