ACE-Step 1.5 Music Generation AIO is a text-to-audio ComfyUI workflow that turns style tags and lyrics into a complete song. You provide short descriptors like "modern pop, female vocal, lush synths" plus your lyrics, and the graph handles the rest—sampling, decoding, and exporting MP3—so you can audition ideas in seconds on consumer hardware. It’s organized into clear groups—Step 1: Load Model, Step 2: Duration, Step 3: Prompt—so you always know where to set the checkpoint, how long the track should be, and what the model should sing or sound like.
Under the hood, CheckpointLoaderSimple loads the ACE-Step 1.5 checkpoint (ace_step_1.5_turbo_aio.safetensors). TextEncodeAceStepAudio1.5 converts your style tags and lyrics into ACE-specific conditioning vectors. ConditioningZeroOut is used to cleanly disable unused conditioning channels (for example, when you’re not providing a secondary or negative prompt), preventing unintended bias. EmptyAceStep1.5LatentAudio creates an empty latent audio canvas for the exact duration you request. ModelSamplingAuraFlow configures the correct sampler/scheduler combination for ACE-Step 1.5, and KSampler iteratively refines the latent into a coherent track based on your prompt. VAEDecodeAudio turns the final latent into waveform audio, and SaveAudioMP3 writes the result to your ComfyUI output folder. PrimitiveNode widgets expose practical knobs like steps, CFG, seed, and seconds so you can tweak speed, quality, and reproducibility.
This setup is useful for fast songwriting demos, genre exploration, and content beds. You get precise control over duration and a repeatable workflow via the seed. The graph ships with sensible defaults so you can generate a short track quickly, then dial in higher quality by raising steps or refining style tags and lyric phrasing.