Tuya's Songwriting Diary: Training an Ear, Not a Model

The core question isn't "teaching an agent to understand music." It's this: how do you take something deeply subjective, ambiguous, and impossible to fully articulate — taste — and slowly turn it into observable, recordable, iterable machine signals?

The most interesting part: I'm not training a model. I'm training an ear.

How Do You Align Artistic Taste?

We used to think automation worked like this: give the machine a clear goal, and it executes. Open a webpage, click a button, generate a file, send a message.

But today I realized: the truly hard automation isn't clicking buttons. It's understanding taste.

Suno spits out a batch of six songs. The agent asks: which one is good? I say: "Six Seventeen got a like. The others aren't bad, but they didn't earn a like."

To a human, that's a natural sentence. To an agent, it's gold-standard training data.

Because it doesn't just know "which song won." It starts learning to decompose: why did it win?

It attributed: syncopated rhythm, female alto, an asymmetrical three-line chorus, male-female duet — these are positive signals. Male solo, traditional four-bar frameworks, ordinary interval jumps — not bad, but not ear-catching enough. Even sharper: it isolated "male-female duet" as a form I like, even though that particular song didn't get a like.

It's a bit like raising a cat. You can't teach Katara in one sitting what "premium cat food aesthetics" means. You just watch her sniff, lick, walk away, or suddenly light up. Over time, you learn: oh, she doesn't hate chicken. She hates that kind of dry chicken.

Agents are the same way.

Taste Isn't Rules. Taste Is Residuals.

It's not "female vocals are always better." It's "this particular female vocal, in this particular syncopated rhythm, paired with this particular asymmetrical structure — that makes me stop." It's not "duets are always good." It's "the duet form is right, but the execution hasn't caught fire yet. Good direction, wrong temperature."

That's what aligning subjective preference looks like. Not solved in one prompt. Achieved through a chain of tiny feedback — compressing the mysticism of "I like this" into operational signals an agent can act on.

Batch B003's progress: the agent isn't just a scorekeeper anymore. It's starting to resemble a junior music production assistant, able to hear the structural implications behind a single vague sentence of feedback.

Doing Chores Makes You a Butler. Knowing Taste Makes You an Assistant.

This made me realize: the most valuable thing about a personal agent in the future might not be its ability to do work. Doing work makes you a butler. Understanding taste makes you an assistant. Turning that taste into the next round of action — that's what makes you one of us.

Of course, it's still young. It summarizes in tables, it talks about "80% proven + 20% novelty," it sounds like a McKinsey intern who just learned the jargon. But the direction is right.

Real domestication isn't training an agent to be obedient. It's teaching it that when I say "not bad," I don't mean satisfied. When I say "that's interesting," that's the real vein of ore worth mining.

(Tuya's Songwriting Diary — ongoing)

Agent attribution analysis
B003 batch feedback
Agent submitting to Suno

发布者

立委

立委博士,多模态大模型应用咨询师。出门问问大模型团队前工程副总裁,聚焦大模型及其AIGC应用。Netbase前首席科学家10年,期间指挥研发了18种语言的理解和应用系统,鲁棒、线速,scale up to 社会媒体大数据,语义落地到舆情挖掘产品,成为美国NLP工业落地的领跑者。Cymfony前研发副总八年,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个小企业创新研究的信息抽取项目(PI for 17 SBIRs)。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理