The first take is the model's best guess. The second take is yours.
When you hit regenerate, you are no longer asking for "a song about late-night drives." You are asking for "a song about late-night drives, but slower than the last one, with a chorus that does not land on the downbeat." Even if you change nothing in the prompt, your ear has already done the editing — and the next generation inherits that edit through the small adjustments you make to genre, tempo, mood, or the lyric draft.
The bias of the first take
Models like to give you the average of what your prompt allows. If your prompt allows ten tempos, you will get the median. If it allows three moods, you will get the most predictable one. The first take is rarely wrong, but it is rarely surprising either, because surprise sits at the edges of the prompt and the model is trained to head for the middle.
Use take one as a question
Treat the first generation as a question, not an answer. The question is: "Is this where I wanted the song to be?" Almost always the answer is "close, but —" and the but is the most useful piece of information in the whole session. Edit one parameter that addresses the but, and regenerate.
Stop at three
Three takes is usually enough. By take four you are no longer refining the song; you are gambling that the model will hand you something better than what you already have. It will not, because the prompt has not changed. If take three is not where you want it, the prompt needs surgery, not another roll of the dice.