The core question isn't "teaching an agent to understand music." It's this: how do you take something deeply subjective, ambiguous, and impossible to fully articulate — taste — and slowly turn it into observable, recordable, iterable machine signals?
The most interesting part: I'm not training a model. I'm training an ear.
How Do You Align Artistic Taste?
We used to think automation worked like this: give the machine a clear goal, and it executes. Open a webpage, click a button, generate a file, send a message.
But today I realized: the truly hard automation isn't clicking buttons. It's understanding taste.
Suno spits out a batch of six songs. The agent asks: which one is good? I say: "Six Seventeen got a like. The others aren't bad, but they didn't earn a like."
To a human, that's a natural sentence. To an agent, it's gold-standard training data.
Because it doesn't just know "which song won." It starts learning to decompose: why did it win?
It attributed: syncopated rhythm, female alto, an asymmetrical three-line chorus, male-female duet — these are positive signals. Male solo, traditional four-bar frameworks, ordinary interval jumps — not bad, but not ear-catching enough. Even sharper: it isolated "male-female duet" as a form I like, even though that particular song didn't get a like.
It's a bit like raising a cat. You can't teach Katara in one sitting what "premium cat food aesthetics" means. You just watch her sniff, lick, walk away, or suddenly light up. Over time, you learn: oh, she doesn't hate chicken. She hates that kind of dry chicken.
Agents are the same way.
Taste Isn't Rules. Taste Is Residuals.
It's not "female vocals are always better." It's "this particular female vocal, in this particular syncopated rhythm, paired with this particular asymmetrical structure — that makes me stop." It's not "duets are always good." It's "the duet form is right, but the execution hasn't caught fire yet. Good direction, wrong temperature."
That's what aligning subjective preference looks like. Not solved in one prompt. Achieved through a chain of tiny feedback — compressing the mysticism of "I like this" into operational signals an agent can act on.
Batch B003's progress: the agent isn't just a scorekeeper anymore. It's starting to resemble a junior music production assistant, able to hear the structural implications behind a single vague sentence of feedback.
Doing Chores Makes You a Butler. Knowing Taste Makes You an Assistant.
This made me realize: the most valuable thing about a personal agent in the future might not be its ability to do work. Doing work makes you a butler. Understanding taste makes you an assistant. Turning that taste into the next round of action — that's what makes you one of us.
Of course, it's still young. It summarizes in tables, it talks about "80% proven + 20% novelty," it sounds like a McKinsey intern who just learned the jargon. But the direction is right.
Real domestication isn't training an agent to be obedient. It's teaching it that when I say "not bad," I don't mean satisfied. When I say "that's interesting," that's the real vein of ore worth mining.
(Tuya's Songwriting Diary — ongoing)


