Wan2.7: Image to Video - ComfyUI Workflow

This ComfyUI tutorial workflow shows you how to turn a still image (or a pair of start/end images) and an audio track into a coherent video with sound using Wan2.7. The core is the Wan2ImageToVideoApi node, which takes your reference image(s) and audio, then synthesizes the in-between frames so the video begins on your first frame, transitions naturally, and lands on your last frame. LoadImage nodes feed the first and last control frames, while SaveVideo writes the final output to disk.

Technically, the first and last frames act as strong compositional anchors: Wan2.7 preserves the subject and layout at those endpoints and generates motion between them. The audio you provide is used to align timing and pacing so the video duration matches the track and motion evolves in sync. Keep your start and end images aligned in aspect ratio and subject framing for cleaner transitions. The SaveVideo node then packages the generated frames (and the synchronized audio from the API) into a playable video file.