Grok: Video generation - ComfyUI Workflow

Retour

This ComfyUI workflow generates short videos—up to 15 seconds—using the Grok model, with an automatically synchronized audio track. At its core is the GrokVideoNode, which accepts either a pure text prompt (text-to-video) or a starting frame from LoadImage (image-to-video). The node handles inference and returns a ready-to-save video clip that pairs the visuals with audio produced by the model. SaveVideo then writes the result to disk as a standard video file, preserving the embedded audio when present.

Technically, the workflow is minimal and direct: LoadImage (optional) feeds an initial frame into GrokVideoNode, which synthesizes the motion and soundtrack from your prompt and/or reference image, and SaveVideo commits the output to a file. Keeping duration capped at 15 seconds ensures responsive generation and stays within the model’s capabilities. The result is a practical pipeline for rapid concepting, animating stills, or creating short social-ready clips without leaving ComfyUI.