ACE-Step 1.5XL Turbo: Text to Music

This workflow turns a plain-language text prompt into finished music using the ACE-Step 1.5 XL Turbo model. Your prompt is encoded by TextEncodeAceStepAudio1.5 (backed by DualCLIPLoader) into audio-aware conditioning. EmptyAceStep1.5LatentAudio creates a blank latent audio canvas at your chosen length, and UNETLoader loads the distilled 4B ACE-Step checkpoint (acestep_v1.5_xl_turbo_bf16.safetensors). ModelSamplingAuraFlow configures the sampler schedule tuned for ACE-Step, and KSampler performs just 8 diffusion steps to synthesize the latent audio. The audio is then reconstructed with VAELoader + VAEDecodeAudio and saved via SaveAudioMP3.

What makes this workflow practical is its “turbo” configuration: no classifier-free guidance (CFG) is used, enabled by ConditioningZeroOut feeding an intentionally empty negative conditioning. This keeps prompts straightforward (focus only on what you want), while ModelSamplingAuraFlow and KSampler deliver fast, stable results. The graph is organized into Model and Prompt groups for clarity, and uses PrimitiveInt/PrimitiveNode for quick edits to seed, steps, and duration. The result is a tight, teachable text-to-music pipeline you can run repeatedly for variations or integrate as a reusable subgraph.