
This workflow turns text prompts into music using the ACE-Step 1.5 XL Base (4B) model inside ComfyUI. It pairs UNETLoader with the acestep_v1.5_xl_base_bf16.safetensors diffusion model and VAELoader with the ace_1.5_vae.safetensors decoder. Your prompt is encoded by TextEncodeAceStepAudio1.5, an EmptyAceStep1.5LatentAudio node creates a blank latent audio clip at your chosen duration, and KSampler performs the denoising pass to synthesize music. VAEDecodeAudio reconstructs the waveform from latents, and SaveAudioMP3 writes the final track to disk.
Under the hood, ConditioningZeroOut provides a clean fallback when the negative prompt is empty, while ModelSamplingAuraFlow configures the model’s sampler to the correct flow/schedule so KSampler can produce stable, on-style results. PrimitiveNode and PrimitiveInt nodes supply simple controls for duration, steps, guidance (CFG), and seed. The workflow is organized into clear groups (Model, Duration, Prompt) so you can quickly load the right weights, set the clip length, write a prompt, and iterate rapidly by adjusting steps, CFG, and seed.