Can I use my own audio recordings?

Yes, you can upload your own audio files using the LoadAudio node or record directly using the RecordAudio node.

What resolutions are supported for video output?

The workflow supports video outputs at 480P (832x480) and 720P (1280x720) resolutions.

How do I ensure the character's lips sync with the audio?

The workflow uses the WanHuMoImageToVideo node to align lip movements with the audio input, ensuring accurate synchronization.

What types of input can I use for customization?

You can customize videos using text prompts, reference images, and audio inputs to guide the video generation process.

HuMo Video Generation - ComfyUI Workflow

Back

The HuMo Video Generation workflow is designed to create high-quality videos that synchronize a character's lip movements with audio input. This workflow leverages the HuMo model, specifically tailored for human-centric video generation. By integrating various input modalities—text, images, and audio—it allows for a rich and customizable video output. The core of this workflow includes nodes like CLIPTextEncode for text processing, LoadImage for reference imagery, and AudioEncoderEncode for audio input. These are combined with video-specific nodes such as CreateVideo and SaveVideo to produce the final output. The WanHuMoImageToVideo node is particularly crucial as it bridges static images and dynamic video generation, ensuring that the character's lip movements align with the audio track. This workflow is ideal for creating engaging video content where precise lip-syncing and visual customization are required.