What is the optimal CFG scale for this workflow?

The optimal CFG scale ranges from 3 to 5, allowing a balance between creativity and adherence to the text prompt.

How many steps should I use for image generation?

It is recommended to use between 30 to 50 steps for generating high-quality images.

Can I use negative prompts with this workflow?

Yes, the workflow is responsive to negative prompts, allowing you to refine and control the output more precisely.

What file format are the generated images saved in?

The generated images are typically saved in standard formats like PNG or JPEG, depending on the configuration of the SaveImage node.

Z-Image: Text to Image - ComfyUI Workflow

Back

The Z-Image: Text to Image workflow is designed to transform textual descriptions into high-quality images with diverse aesthetics and exceptional photorealistic quality. This workflow leverages the Z-Image model, which is renowned for its ability to fine-tune outputs and respond effectively to negative prompts, ensuring high generation diversity. The workflow utilizes specific nodes such as SaveImage and MarkdownNote, as well as a unique node identified by the ID 9b9009e4-2d3d-445f-9be5-6063f465757e. These nodes work together to process input text, generate images, and save the results for further use.

Technically, the workflow employs the qwen_3_4b text encoder to interpret the input text and the z_image_bf16 diffusion model to generate images based on this interpretation. The diffusion model is configured to run between 30 to 50 steps, with a CFG (Classifier-Free Guidance) scale of 3 to 5, balancing creativity and adherence to the text prompt. This setup is particularly useful for artists and designers who require a foundation for creative freedom, as it allows for fine-tuning and customization of the generated images to meet specific aesthetic needs.