Qwen-Image: Text to Image - ComfyUI Workflow

This workflow demonstrates text-to-image generation with Qwen-Image’s 20B MMDiT diffusion transformer inside ComfyUI. It centers on a dedicated Qwen-Image generation node (UUID: e5cfe5ba-2ae0-4bc4-869f-ab2228cb44d3), which loads the Qwen-Image model weights and produces an image directly from your prompt. A MarkdownNote node provides inline guidance within the graph, and a SaveImage node writes the final output to your ComfyUI output directory.

Technically, the Qwen-Image node runs the 20B MMDiT model end-to-end, handling prompt conditioning and diffusion steps internally. The official model files are available from the Comfy-Org/Qwen-Image_ComfyUI repositories (Hugging Face or Modelscope). The workflow supports practical presets for aspect ratios and resolutions (e.g., 1328×1328 for 1:1, 1664×928 for 16:9, 928×1664 for 9:16, 1472×1140 for 4:3). On an RTX 4090D 24GB, the FP8 e4m3fn configuration has been observed to complete a first image in roughly 90 seconds and subsequent images faster, depending on settings. Qwen-Image is designed to render multilingual text inside images, making it well-suited for posters, signage, UI mocks, and other graphics where readable text is important.