How do I keep the original audio in the final video?

Use the VHS_VideoCombine path. It combines the processed frames with the source audio track from VHS_LoadVideo. Then send its output to SaveVideo. If you use CreateVideo, the output is typically silent.

What inputs give the best face swap quality?

Use a high-quality reference image with a frontal or near-frontal face under lighting similar to the target video. Ensure the target face is reasonably large and not heavily motion-blurred. Match face scale with ImageResizeKJv2 and use ReservedRegionFrameComposer’s feather/opacity to blend naturally.

My results flicker or misalign between frames. How can I stabilize them?

Process every frame (set frame stride to 1), keep identity strength moderate, and increase mask feather for smoother edges. Check that the reference face scale closely matches the target head using ImageResizeKJv2. Preview a short segment, adjust, then render the full clip.

What if I run into performance or memory issues?

High resolutions and long clips are resource-intensive. Reduce input resolution in VHS_LoadVideo, downscale with ImageResizeKJv2, or process shorter segments first. Close other GPU apps, and consider lowering video FPS or using a lower bitrate when exporting.

Video Face Swap - LTX2.3 Lora

Back

This ComfyUI workflow performs frame-accurate video face swapping using an LTX 2.3 LoRA trained by @Alissonerdx. It loads a target video, generates a swapped face layer per frame, and blends that layer back onto the original frames with a controllable mask. The pipeline relies on VHS_LoadVideo to decode frames (and optionally read audio), a custom LTX face-swap node (dc72113f-8276-4a5a-af12-85d6bec89ed5) to apply the LoRA-driven identity transfer, and ReservedRegionFrameComposer to precisely composite the swapped face into each frame. ImageResizeKJv2 is used to match scales between the reference face and target head, while PrimitiveFloat/PrimitiveInt nodes expose key parameters like blend strength, mask feather, frame stride, and FPS.

After per-frame compositing, the frames are assembled into a video with either CreateVideo (silent output) or VHS_VideoCombine (to mux the original audio back in). SaveVideo writes the final result to disk. Utility nodes such as ImageConcanate help align or batch intermediate image streams, and ComfyMathExpression handles simple math for sizing, timing, and index ranges. The result is a practical, controllable face swap workflow that keeps your original motion, lighting, and scene intact while replacing identity consistently across frames.