Wan 2.2 14B First-Last Frame to Video

This workflow introduces First–Last Frame to Video (FLF2V) for Wan 2.2 (14B), a conditioning method that lets you specify both the opening and closing frames and have the model synthesize a smooth, prompt-guided journey between them. Instead of expanding from a single keyframe or relying on optical-flow interpolation, the WanFirstLastFrameToVideo node enforces boundary conditions at t=0 and t=1 inside the diffusion process. That means the start and end images are respected as ground truths while the model generates new, coherent in-betweens that follow your text prompt.

Under the hood, the workflow loads Wan 2.2 with UNETLoader, a compatible text encoder via CLIPLoader (UMT5 XXL FP8 scaled), and VAE components, then runs KSamplerAdvanced with an SD3-style schedule (ModelSamplingSD3). An optional “lightning” 4-step LoRA can dramatically speed up renders at a modest quality trade-off. The result is a practical, reproducible pipeline in ComfyUI that improves on prior image-to-video approaches by jointly conditioning on both endpoints, reducing identity drift, preserving scene intent, and producing transitions that feel authored rather than warped.