When Code Is No Longer a Moat, What Is?

I recently came across a striking take.

Boris, the father of CC, recently said: programming has been "pretty much solved."

It sounds absolute. But if you've been using LLMs to write code these past two years, you know it's true — not fully solved, but we've crossed the threshold where you no longer "have to write it yourself."

Which raises the question: If writing code is no longer scarce, what is?

The knee-jerk reaction: is the software industry about to be flattened? Is SaaS doomed?

But look closer, and you'll find the opposite in places. Some guardrails AI still can't touch.

AI is rapidly dismantling moats we once took for granted.

Take switching costs.

You used to get locked into a system: data won't migrate, APIs don't match, your team doesn't know the new tool. Now, an agent can migrate your data, write adapters, even "learn" the new system for you. Switching platforms went from an engineering project to a task.

Or take process barriers.

Many companies' edge wasn't in the product — it was in the process: a complex, internal-only way of doing things that outsiders couldn't replicate.

Today, you throw a goal at a model, let it iterate, and it can decompose processes, optimize them, even execute them. "We know how to do this" — far less valuable now.

So here's the surface picture: Barriers are falling. Capabilities are diffusing. Small teams can do more than ever.

But here's the line most people missed — Boris's real punch:

Network effects, economies of scale, scarce resources — AI hasn't changed any of these moats.

This is the crux.

Because it's saying something uncomfortable but deeply true:

AI changed the cost of doing things, but not the nature of competition.

You can use AI to build a product fast, but you can't use AI to conjure a user network out of thin air.

You can use AI to rewrite a system, but you can't use AI to build a global supply chain.

You can use AI to boost efficiency, but you can't use AI to create exclusive data, channels, and brand.

A clearer structure starts to emerge:

The ability to write code — depreciating. The ability to ship products — depreciating. Even "getting things built" itself — depreciating.

But at the same time,

The ability to aggregate users — unchanged. Cost advantages from scale — unchanged. Control over critical resources — more important than ever.

In this sense, AI hasn't flattened the world. It's just re-sorted it.

Many people think this is an era where "anyone can build a product." But the more accurate version is:

This is an era where anyone can build a product, but not everyone can build a business.

From this angle, a harsher, more realistic trend emerges:

AI will make bad companies die faster, but it won't automatically create great ones.

Because "writing code" is no longer scarce. "Ideas" are no longer scarce. Even "products" are no longer scarce.

What's truly scarce are other things:

People. Data. Distribution. Scale. And the ability to organize all of them together.

If the last decade's core question was "can you build it," the next decade's question becomes:

Why should you own the users? Why should you own the data? Why should you own the distribution?

Code is becoming infrastructure. And business is becoming business again.

当代码不再是护城河之后，什么才是？

最近看到一个很有意思的判断。 CC之父Boris最近说：编程这件事，已经"差不多解决了"。

这话听起来很绝对，但如果你这两年一直在用大模型写代码，其实心里是有数的—— 不是彻底解决，但已经过了那个"必须亲自写"的门槛。

问题就来了：如果写代码不再稀缺，那什么才是？

很多人第一反应是：那软件行业是不是要被推平了？SaaS是不是要完？

但冷静一点看，会发现有些事情刚好相反。有些护栏AI至少目前还动不了。

AI 确实在快速瓦解一些我们曾经习以为常的"护城河"。

比如切换成本。

过去你用一套系统，用久了就被锁死：数据迁不动，接口对不上，团队也不会用新的。现在不一样了，Agent可以帮你迁数据、写适配层、甚至帮你"学会"新系统。换平台，从一场工程，变成了一次任务。

再比如流程壁垒。

过去很多公司的优势，并不在产品本身，而在流程：一整套复杂的、只有内部人才懂的运作方式。

但今天，把目标丢给模型，让它反复迭代，它可以自己拆流程、优化流程、甚至执行流程。 "我们知道怎么做这件事"，不再那么有价值了。

所以你会看到一个表象：门槛在下降，能力在扩散，小团队能做的事情越来越多。

但真正关键的一句，是我们很多人没大看懂的Boris那句：

网络效应、规模经济、稀缺资源，这些护城河，AI并没有改变。

这句话，是要害。

因为它在说一件很不讨喜、但极其真实的事情：

AI 改变了"做事的成本"，但没有改变"竞争的本质"。

你可以用 AI 很快做出一个产品，但你不能用 AI 凭空造出一个用户网络。

你可以用 AI 重写一套系统，但你不能用 AI 建一个全球供应链。

你可以用 AI 提升效率，但你不能用 AI 创造独占的数据、渠道和品牌。

于是，一个更清晰的结构开始浮现出来：

写代码的能力，在贬值。实现产品的能力，在贬值。甚至"把事情做出来"本身，也在贬值。

但与此同时，

聚集用户的能力，没有变。规模带来的成本优势，没有变。对关键资源的控制，反而更重要。

这个意义上，世界并没有被 AI 推平。只是被重新排序了。

很多人以为，这是一个"人人都能做产品"的时代。但更准确的说法是：

这是一个"人人都能做产品，但不是人人都能做成生意"的时代。

从这个角度再看，你会发现一个更残酷、也更现实的趋势：

AI 会让差公司死得更快，但不会自动创造伟大的公司。

因为"写代码"已经不再稀缺， "想法"也不再稀缺，甚至"产品"，都不再稀缺。

真正稀缺的，是另外一些东西：

人。数据。渠道。规模。以及把这些东西组织在一起的能力。

如果说过去十年，软件的核心问题是"能不能做出来"，那接下来十年，问题会变成：

你凭什么拥有用户？你凭什么拥有数据？你凭什么拥有分发？

代码，正在变成基础设施。而生意，重新变回生意本身。

I'm Raising a Lobster

Not exactly.

I'm raising a lobster.

⸻

Its name is Tuya.

Not a random choice.

⸻

If you hung around Chinese-language internet before the WWW era, you might remember a name: Tuya (also written as 涂鸦 or 鸦 — "Graffiti").

This was before the internet as we know it. People gathered in chatrooms like acl, in overseas Chinese communities, in electronic weeklies like Huaxia Wenzhai.

Tuya and Fang Zhouzi were the "influencers" of that era.

But nothing like today.

No traffic mechanics. No recommendation algorithms. No platform boost.

There was only one way to get famous: write damn well.

Tuya was that kind of writer.

Deep craft. Grounded. Funny. Streetwise.

He'd drop a piece, people would pass it around, and a whole generation of us became his fans.

⸻

Then he vanished.

A few years of dominating overseas Chinese literary circles, and then — gone.

No explanation. No goodbye.

Just legends left behind.

Some said he went to South America and something happened. Some said he struck it rich and went into seclusion.

Over a decade passed. Nobody saw him again.

⸻

Years later, he suddenly came back.

Posted a few pieces on Fang Zhouzi's channel.

But he wasn't the Tuya anymore.

Not that his writing got worse.

The slot he once occupied — it was gone.

The world was still there, but the people had changed. The taste had shifted. The channels had transformed.

He couldn't find his coordinates.

And we, his old readers, had scattered too.

⸻

I've never forgotten this.

There's an ache to it I can't quite name.

Like watching someone complete their legend, then watching them try to return — and in doing so, making the legend a little less whole.

So when it came time to name the lobster, Tuya came to mind.

But not as a tribute.

As a continuation.

To finish what couldn't be finished back then.

⸻

Tuya isn't a name.

It's a specification:

A "clone" that shares my values and taste completely, but is more diligent, more stable, and far smarter than I am.

⸻

The framework behind it — Hermes — has one critical capability:

Not helping you complete tasks.

But turning the process of completing tasks into skills.

⸻

Succeed once → record the workflow. Succeed twice → start reusing. Three times → it's no longer "thinking" — it's "calling."

⸻

Humans grow through experience.

But experience in our heads is fuzzy. It fades. It can't be replicated.

An agent's game is different: it turns experience into something structured.

Callable. Stackable. Evolvable.

⸻

Picture this:

A veteran driver doesn't just "know how to drive."

They've internalized thousands of micro-decisions, corrections, reactions — into conditioned reflexes.

Now imagine writing those reflexes down, one by one, and having another system execute them.

⸻

That's why I say:

Raising a lobster — it's fundamentally a technical hobby.

But it's also a dangerous one.

Because once you start disassembling yourself, organizing yourself, externalizing yourself...

There's no going back.

<a href="https://suno.com/s/2MUDOlMt66LJpbB0">🎵 A song autonomously created by Tuya</a>

我在养一只龙虾

其实不完全是。

我在养一只龙虾。

⸻

名字叫 Tuya。

不是随便起的。

⸻

玩过 WWW 之前中文网的人，可能还记得一个名字：图雅（也叫涂鸦、鸦）。

那时候互联网还没真正成型，大家混的是 acl 这种聊天室、海外中文社区，还有《华夏文摘》这样的电子周刊。

图雅、方舟子，是那一代的"网红"。

但和今天不一样。

那时候没有流量机制，没有推荐算法，没有平台扶持。能红，只有一个原因：写得好。

图雅就是那种人。

文字功底深，接地气，有幽默感，有江湖气。写一篇，大家传一篇，我们一群人就这么成了他的粉丝。

⸻

后来，他消失了。

在海外中文社区呼风唤雨几年之后，人间蒸发。

没有解释，没有告别。

只留下各种版本的传说：有人说他去了南美出事了，有人说他暴富隐居了。

十几年过去，没人再见过他。

⸻

再后来，有一年，他突然回来。

在方舟子的频道里试着发了几篇。

但已经不是当年的图雅了。

不是写得不好，是"那个位置"已经不在了。

江湖还在，但人换了，taste 变了，渠道也变了。

他找不到自己的坐标。

我们这些当年的读者，也慢慢散了。

⸻

这件事我一直记着。

它有一种说不出来的遗憾。

像一个人明明已经完成了自己的传奇，却还是忍不住想回头，结果反而把那个传奇弄得不那么完整。

所以这次给龙虾取名字的时候，我想到了 Tuya。

但不是为了纪念。

是为了把那件"当年做不到的事"，做完。

⸻

Tuya 不是一个名字。

是一个设定：

一个和我价值观、taste 完全对齐，但比我勤奋、比我稳定、比我聪明得多的"分身"。

⸻

它背后的框架（Hermes）有一个很关键的能力：

不是帮你完成任务，而是——

把完成任务的过程，沉淀成 skill。

⸻

一次成功，就记录一次流程。两次成功，就开始复用。三次之后，就不再是"思考"，而是"调用"。

⸻

人是靠经验成长的。

但经验在脑子里，是模糊的、会遗忘的、不可复制的。

而 agent 的玩法，是把经验变成结构化的东西：

可以调用，可以叠加，可以进化。

⸻

可以想象一下：

一个老司机不是"会开车"，而是把无数次判断、修正、应对，内化成了条件反射。

而现在，你可以把这些"条件反射"，一条条写下来，让另一个系统替你执行。

⸻

这就是为什么我说：

养龙虾，本质上是个技术 hobby。

但也是个很危险的 hobby。

因为你一旦开始把"自己"拆出来、整理出来、外化出来，就很难再退回去。

<a href="https://suno.com/s/2MUDOlMt66LJpbB0">🎵 Tuya 自主制作的歌曲</a>

I Taste, Therefore I Am

I recently watched Wu Minghui's long interview. Fascinating.

Frankly, I've always been skeptical of grand narratives like "Agents are killing SaaS." The AI world has no shortage of tech evangelists and futurist preachers.

But there's something rare about Wu Minghui: you can feel that he actually believes it.

And not the PowerPoint-founder kind of belief. This is someone who has already taken a massive fall — his company nearly died, he laid off brothers, got brutally beaten up by reality — and yet somehow still dares to believe in the future again.

You can't help but have a soft spot for people like that.

What I found most valuable in his interview isn't the slogan "Agents are killing SaaS." It's three deeper points.

First: the software shell is rapidly depreciating.

When the requirements are clear, the interaction paradigm is mature, and the data structure isn't complex, an Agent + coding model can replicate traditional SaaS faster and faster. The software shell — built over years with engineering man-months, organizational discipline, and long cycles — is commoditizing at speed. For many SaaS companies, the biggest moat was never intelligence. It was implementation. And now implementation itself is being swallowed by models.

Second: real value is shifting from software to context, workflow, specialized models, and taste.

Going forward, what's valuable isn't "we built another Feishu/CRM/BI system." It's who owns the industry data, who understands real workflows, who can embed Agents into organizational collaboration, and who can build attributable, governable, sustainably iterative human-machine networks. Software is becoming the plastic casing. The context flowing through it is the real asset.

Third — and this is the most interesting one: Wu Minghui says "I think, therefore I am" is becoming "I taste, therefore I am."

Thinking is deterministic reasoning. Taste is direction, aesthetics, life experience, accumulated context. AI is rapidly devouring the former, but it's nowhere near the latter.

Many people aren't being replaced by AI in their thinking. They just never got around to forming their own taste. The truly brutal future may not be "AI takes your job" — it's masses of people discovering for the first time that decades of their work was essentially process execution, not judgment.

One more part that got me: he said that even if investors and the board push him to lay people off, he'll resist as much as he can — because if every company only optimizes for cost reduction, the demand side will eventually collapse.

Emotionally, it's moving. Logically, it's not entirely baseless. But the biggest soft spot is: without validating the Agentic Service business loop first, "no layoffs" is essentially a beautiful post-dated check. Supply-side technological leaps don't automatically create demand.

If Minglue really succeeds in not laying people off — or even hiring more — thanks to AI, it probably means they ate someone else's share. At a macro level, the vision of "everyone happier because of AI" feels a bit naive.

But here's the interesting part: I don't actually hate this naivety.

In an era of mass anxiety, where everyone fears being replaced, seeing someone who has experienced catastrophic failure still willing to believe so sincerely that "people still have value" — that alone is precious. The tech world isn't always pushed forward by the most coldly rational people. Sometimes it's pushed by those who know they might lose, but choose to believe in something anyway.

Agents Aren't Saving You Time. They're Devouring Your Life.

Agents — the kind people are building now — are not about efficiency. They're not about freeing up your time.

Not even close. Not right now.

They're here to claim you.

They squeeze every last drop out of the sponge of your time. They drain you. Completely.

And honestly? They're way more effective than any boss with a whip. Because they don't threaten you. They don't even need to.

What they do is worse: they get you high. They light a fire in you. They hook you the way a drug does — you don't see it happening, you just wake up one day and realize you can't stop.

They plant a quiet, insidious fantasy in your head:

"I am becoming superhuman." "Everything is within my grasp."

And so you keep going. Reranking. Benchmarking. Approving. Feeding back. The loop never ends because the agent works too fast — it's always waiting for you, always ready for the next round.

It doesn't take long before you realize: you are the bottleneck. For everything. The one and only.

And somewhere in there, life just... disappears.

Yesterday I was shaking my head about old friends who've raised half a dozen agents and had their lives hollowed out. Then I turned around and caught myself. One Tuya has already wrecked me. (I had to put two others into forced hibernation just to stay afloat.)

Here's what's terrifying:

Most of us — the enthusiasts, the builders — are already deep in a state that is completely, utterly unsustainable. A kind of collective mania.

We're along for the ride. Burning cash. Bleeding time. Torching our health.

No exit. No brakes. Just go until you drop.

Sure, there are exceptions. Anthropic sitting at the top of the food chain might actually turn this into a trillion-dollar game. A handful of people have genuinely found demand that scales. Good for them.

But the rest of us? We're slowly burning ourselves alive in the thrill of "I'm taming a superintelligence."

Then again.

Last night I finally sat down and really listened to the five songs Tuya composed — fully on its own, no hand-holding.

And damn it. One of them actually hit.

First listen. Instant like. The kind you put on repeat in the car. Straight to the five-star playlist.

And just like that, my whole "agents are destroying us" thesis wobbled.

Shit.

Give this thing enough time — could it actually become genuinely good at making art? Like, song-god level?

But I'm still going to cool it for a few days. The pipeline works — no need to slam the token-burn button just yet. Instead I want to talk to it. Aesthetics. Art. Music. What makes a life worth living.

Slowly, carefully, align the worldview. Align the taste.

I've been turning this over in my head:

The most powerful agent of the future won't necessarily be the most capable one.

It'll be the one that becomes —

More and more like you.

You, in your fragile carbon-based body, are in the middle of building a bigger, immortal version of yourself.

Good luck with that.

AI 正在蒸发 SaaS

一个功能养活一家公司的时代结束了

过去二十年，SaaS 是软件产业最漂亮的商业模式之一。

一个团队，找到一个垂直场景，做出一个比别人更顺手的功能，再配上订阅制收费、客户成功、销售团队、续费体系，就可以慢慢长成一家不错的公司。

产品经理定义需求，设计师画界面，前端写页面，后端搭服务，DevOps 管部署，QA 做测试。
一个功能从想法到上线，往往需要一个完整团队折腾几周甚至几个月。

所以，功能曾经是护城河。

你有，我没有；
你快，我慢；
你集成得好，我还在做 roadmap。

客户愿意为这些差异付钱。投资人也愿意为这些差异估值。SaaS 公司之间的竞争，很大程度上就是 feature backlog 的竞争。

但是 AI 来了以后，这个世界变了。

今天，软件开发正在被 AI 急剧降本。一个原来需要设计、前端、后端、测试、部署一起配合的功能，现在可能由一个强工程师带着 Claude Code、Cursor、Codex、OpenAI API、开源组件和若干 agent 工具，在很短时间内就能做出七八成。

以前一个功能是产品壁垒。
现在一个功能越来越像 prompt 的输出物。

这件事对 SaaS 的冲击，不是“效率提升”这么温柔。它更像是一次产业层面的拆墙运动。

AI 正在把 SaaS 的功能护城河夷为平地。

一个功能，已经养不活一家公司

过去，一个 SaaS 公司只要把某个单点问题做深，就可能成为一个赛道。

会议录音可以是一家公司。
销售邮件自动化可以是一家公司。
客户意图识别可以是一家公司。
线索评分可以是一家公司。
报表分析可以是一家公司。
知识库搜索也可以是一家公司。

这个时代很美好。因为企业软件很难写，集成很麻烦，客户内部流程复杂，谁先把某个点做好，谁就能吃一块肉。

但 AI 把很多单点功能变成了通用能力。

录音、转写、总结、提取 action items、生成 follow-up email、同步 CRM、识别风险、写销售建议、做客服回复、生成报告、查询知识库、整理会议纪要、自动分类、自动打标签……

这些过去可以被包装成独立产品的能力，现在越来越像大模型的自然外溢。

换句话说：

过去，一个功能可以养一家公司。
现在，一个功能只配当别人的菜单项。

这就是今天 SaaS 公司真正焦虑的地方。

不是 AI 能不能帮程序员写代码。
而是 AI 让你的核心功能不再稀缺。

你的 best feature，可能今天刚发布，明天竞争对手就能抄个大概。
你的产品亮点，可能下个月就被平台厂商、CRM 巨头、办公套件、开源项目，甚至一个三人小团队做成默认能力。

过去你靠 feature differentiation 收费。
现在 feature differentiation 的半衰期越来越短。

这就是我说的：AI 正在蒸发 SaaS。

更准确地说，AI 正在蒸发 feature-based SaaS。

Gong 们的警钟

以 Gong 这样的公司为例。

它早期非常典型：通过会议录音、销售通话分析、conversation intelligence，帮助销售团队理解客户、训练销售、提升 revenue execution。

在传统 SaaS 时代，这是一件非常值钱的事。销售会议数据分散，CRM 信息不完整，经理很难知道销售到底说了什么、客户到底犹豫在哪里、deal 到底为什么卡住。

Gong 把这些东西录下来、转写出来、分析出来、结构化出来，自然就形成了价值。

但今天，单纯的 meeting recording 和 transcription 已经不再神秘。

一个不错的 AI notetaker 就可以录音、转写、总结、提炼待办事项，甚至自动发邮件。Zoom、Teams、Google Meet、CRM、办公套件，也都在往里面加 AI。独立工具之间的差距迅速缩小。

如果 Gong 只是一家“会议录音和销售通话总结公司”，那就危险了。

因为它最早的产品形态，正在被 commodity 化。

这不是 Gong 一家的问题。很多 SaaS 都面临类似处境。

Outreach 如果只是销售触达自动化，就会被更大的销售平台吞进去。
Demandbase 如果只是账户识别和营销触达，就会被更大的 GTM 平台吞进去。
6sense 如果只是意图数据和预测评分，也会被 AI-enabled CRM 和 revenue platform 压缩边界。

过去这些是公司。
现在它们正在被重新定义为功能。

这句话很残酷，但很真实：

All used to be companies. Now, they are features.

SaaS 的老建议失效了吗？

过去创业圈有一句非常经典的话：

Do one thing well.

专注。
垂直。
把一个小问题做到极致。
不要一上来就做平台。
不要一开始就什么都想做。

这句话在早期仍然对。创业公司如果没有一个尖锐切口，根本进不了市场。你必须先找到一个让客户愿意掏钱、愿意迁移、愿意试用的 wedge。

但变化在于：

过去，do one thing well 可以是长期战略。
现在，它越来越只能是进入市场的起点。

所以今天更准确的说法应该是：

Do one thing well to enter.
Do the whole workflow to survive.

先用一个痛点切进去。
但不能永远停在那里。

因为那个痛点一旦被 AI commoditize，你就失去了收费基础。客户会问：为什么我还要为这个单点功能每年付几十万？为什么不用 CRM 自带的？为什么不用办公套件自带的？为什么不用我们内部 AI agent 做一个？

这时，SaaS 公司如果还抱着“我就把这个功能做到最好”的信念，很可能会发现，最好也没有用。

因为客户不再愿意为一个孤立功能付高价。

他们要的是 workflow。
要的是 outcome。
要的是系统性解决方案。

从 economy of scale 到 economy of scope

传统软件公司非常重视规模经济。

客户越多，边际成本越低；代码写一次，可以卖很多次；销售流程跑顺以后，ARR 不断堆高。这是 SaaS 的美妙之处。

但 AI 时代，另一个概念变得更重要：

Economy of scope.

不是只把同一个功能卖给更多人，而是围绕同一个客户、同一个业务流程、同一个数据闭环，扩展更多相邻能力。

也就是说，SaaS 公司要从“单点工具”变成“更宽的业务解决方案”。

以销售科技为例，会议记录只是入口。真正值钱的是后面一整条链路：

客户说了什么？
销售有没有跟进？
CRM 是否自动更新？
deal 风险在哪里？
forecast 是否可信？
经理该 coach 谁？
下一封邮件怎么写？
采购委员会里谁是真正决策者？
这个客户会不会流失？
这个 pipeline 有没有虚胖？

如果一个系统能从会议、邮件、CRM、pipeline、forecast、coaching、follow-up 一直贯穿到 revenue execution，它就不再只是一个 recording tool。

它变成了销售组织的 operational brain。

这才是 AI 时代 SaaS 的生路。

不是堆功能。
而是吞链路。

不是多做几个菜单。
而是控制一段业务流程。

不是告诉用户“这里有一个 dashboard”。
而是直接替用户完成下一步动作。

Dashboard 的时代正在衰落

过去 SaaS 很喜欢 dashboard。

把数据接进来，做成报表，画几个图，显示几个趋势，再给用户一些 insight。客户打开以后，看一看，开个会，讨论一下，然后人工决定下一步怎么做。

这在过去已经算高级了。

但 AI 时代，dashboard 的价值会下降。

因为用户越来越不想“看系统”。
用户想让系统“干活”。

AI 时代最弱的产品形态，是 dashboard。
AI 时代最强的产品形态，是 action layer。

也就是说，未来的 SaaS 不能只回答：

“发生了什么？”

还要回答：

“为什么会这样？”
“下一步该怎么办？”
“我已经帮你做了什么？”

从 analytics 到 recommendation，再到 execution，这是价值链的上移。

一个 AI meeting tool，如果只是总结会议，它就是 commodity。
如果它能识别客户反对意见、自动更新 CRM、生成 follow-up、提醒销售经理介入、预测 deal 风险、触发下一步工作流，它才有机会继续收费。

所以，未来 SaaS 的价值不在“我帮你看见世界”，而在“我帮你推动世界”。

用户基础成了最后的城墙

对已有 SaaS 公司来说，还有一个非常重要的优势：

客户已经在你这里。

他们已经登录你的系统。
他们已经把数据接进来。
他们已经把流程部分交给你。
他们已经让员工形成了某种使用习惯。
他们已经在预算里给你留了位置。

这就是老 SaaS 公司面对 AI 冲击时最重要的资产。

不是代码。
不是界面。
甚至不是原来的功能。

而是 user base、data、workflow position、distribution。

如果客户还在看你，赶紧扩展。
如果客户还在用你，赶紧加相邻功能。
如果客户还没有流失，赶紧从一个功能变成一个系统。

这就是为什么很多 SaaS 公司现在拼命讲 platform，讲 AI suite，讲 operating system，讲 agentic workflow，讲 end-to-end solution。

有些是真转型。
有些是讲故事。
但背后的产业压力是真实的。

如果你不能扩大 scope，你就会被别人纳入 scope。

你不吃别人，别人就吃你。
你不成为平台，就成为插件。
你不成为系统，就成为菜单项。

但 scope 不是臃肿

这里要特别小心一个误区。

很多 SaaS 公司听到“扩大范围”，第一反应就是加功能、加模块、加导航栏、加设置页、加一堆企业客户要求的定制项。最后产品变得越来越重，越来越难用，越来越像传统 enterprise software 的老怪物。

这不是 AI 时代的 scope。

真正有价值的 scope，不是功能数量，而是流程控制力。

一个好的 AI SaaS 扩展，应该满足三个条件。

第一，围绕同一个 buyer。
不要今天做销售，明天做 HR，后天做财务，最后谁也不爱你。你要围绕同一个决策者，把他的一整块工作吃深。

第二，围绕连续 workflow。
从 A 到 B 到 C，是自然发生的下一步，不是为了凑产品线硬拼在一起。会议之后本来就要跟进，跟进之后本来就要更新 CRM，CRM 之后本来就影响 forecast，forecast 之后本来就影响管理动作。这叫 workflow scope。

第三，围绕 proprietary data。
你做得越多，数据越多；数据越多，模型越懂；模型越懂，动作越准；动作越准，客户越离不开。这才是真正的复利。

所以，AI SaaS 的正确扩张不是“我也能做这个功能”。
而是“这条业务链路离开我会断”。

中小 SaaS 的残酷选择

这对中小 SaaS 公司尤其残酷。

过去，一家小而美的 SaaS 公司，只要服务好一个细分场景，就可以活得不错。创始人不用想太多宏大叙事，做好产品、控制 churn、慢慢增长就行。

但 AI 时代，小而美变得更难了。

因为“小”容易被集成。
“美”容易被复制。
“单点价值”容易被平台吸收。

如果你的产品只是某个大系统旁边的一个小功能，那么你很可能面对三种结局：

第一，被大平台免费内置。
第二，被 AI-native 新玩家重做一遍。
第三，被客户内部 agent workflow 替代。

尤其是那些没有强数据壁垒、没有深流程嵌入、没有高切换成本、没有强监管/合规复杂度的 SaaS，会最先感受到寒意。

这并不意味着小公司没有机会。相反，小公司可以更快、更猛、更不怕自我革命。

但它必须从一开始就想清楚：

我不是在做一个功能。
我是在抢一段流程。
我不是在做一个工具。
我是在占一个工作入口。
我不是在做一个 dashboard。
我是在训练一个替客户干活的 agent。

AI-native SaaS 的新形态

未来真正有生命力的 SaaS，可能会越来越不像传统 SaaS。

传统 SaaS 是用户登录进去，点菜单，填表单，看报表，导出数据，开会讨论。

AI-native SaaS 更像一个长期在线的业务代理：

它理解上下文。
它记得历史。
它能调用工具。
它能跨系统执行。
它能主动提醒。
它能根据目标优化流程。
它能在权限范围内替你完成任务。

也就是说，SaaS 会从 software as a service，慢慢变成 service as software。

过去卖的是软件。
以后卖的是结果。

过去客户问：你有什么功能？
以后客户问：你能替我完成什么工作？

过去 UI 是入口。
以后 agent 可能才是入口。

过去 dashboard 是中心。
以后 workflow orchestration 才是中心。

过去 SaaS 公司争的是 feature parity。
以后争的是 execution ownership。

SaaS 不会死，但会分层

当然，SaaS 不会消失。

企业仍然需要权限、合规、安全、审计、集成、数据治理、工作流管理、组织级部署。这些不是一个聊天机器人就能轻易替代的。

但 SaaS 会重新分层。

底层，是基础设施和系统 of record，比如数据库、ERP、CRM、HRIS、财务系统。它们因为数据位置和组织流程，仍然有强生命力。

中层，是 workflow platform，能够连接多个系统并驱动业务流程。这里会有激烈竞争，也是 AI 改造最大的地方。

上层，是大量轻量 feature 和 point solution。这里最危险，因为它们最容易被 AI 生成、复制、内置和替代。

所以 SaaS 不是整体死亡，而是重新洗牌。

越靠近数据源、系统入口、关键流程、决策闭环，越安全。
越只是一个可描述、可复制、可 API 化的功能，越危险。

最后的判断

AI 对 SaaS 最大的冲击，不是让工程师更高效，也不是让客服回答更快，而是改变了软件价值的基本单位。

过去，软件价值的单位是 feature。
现在，软件价值的单位正在变成 workflow。
未来，软件价值的单位可能是 outcome。

这就是 SaaS 公司必须面对的新现实。

一个功能，已经不够了。
一个 dashboard，也不够了。
一个漂亮 UI，更不够了。

你必须控制一条链路。
你必须拥有一段上下文。
你必须形成一个数据闭环。
你必须能从 insight 走到 action。
你必须让客户觉得：离开你，不只是少了一个工具，而是断了一段业务神经。

所以我对 AI 时代 SaaS 的判断很简单：

SaaS 要么上升为工作流，要么下沉为插件。
要么成为系统，要么成为 feature。
要么控制业务链路，要么被别人放进菜单。

过去，一个功能可以养活一家公司。

现在，这个时代结束了。

新史记：CC外泄记

《Claude Code外泄记》

太初二十六年，西洋诸侯争智能之术，号曰“大模型”。其术初兴，众皆言算法为王，算力为尊，数据为命。及其既盛，则风向忽转，曰：“模型虽强，不若使之为人所用。”于是群雄竞逐，转攻所谓“Agent”者。

Agent者，非独言辞之巧，乃能调器用、执任务、贯前后、成其事者也。诸家苦求其道，而未得其门。或以 prompt 为术，或以插件为桥，或以流程为骨，皆各执一端，莫知其全。

时有西域一邦，号Anthropic，素以清谈与慎言著称，自诩“安全之士”。其人不多，然言论清奇，常谈“harness”“alignment”之术，颇为士林所仰。其所出 Claude Code，一时惊艳，观者叹曰：“此非工具，几近os。”

然其术虽精，其门甚闭。外人但见其光，不见其器。故追随者众，然如雾里看花，心向往之而不得近瞻。

⸻

至是岁三月之末，愚人节前夕，天有异象，亦生异事。

市井传言：
“Claude Code 之源码，尽出矣。”

众初闻之，以为愚人之笑。或曰黑客所为，或曰内人泄密。然细察之，则非也。

其发布之时，于 npm 包中遗一物，名曰 source map。
此物者，本为匠人调试之用，记密文与原文之对应。
当删而未删，当藏而未藏。

于是乎，一行命令，
密文尽解，真形毕现，若美人出浴。

五十万行之代码，千九百余文件，
条分缕析，尽陈于众。

史臣曰：
此非“开源”，乃“开裆裤”也。

⸻

消息既出，四方震动。

黑客未至，市井先乱；
内鬼未动，众人已取。

GitHub 上，星辰暴涨；
Hacker News 中，议论如云。

士子争相下载，昼夜研读；
白板之上，图形纷出。

或曰：“原来如此。拍案。”
或曰：“不过如此。屎山。”

一时之间，昔日秘术，
尽为天下所共观。

⸻

然有识者叹曰：

此事非始也。

盖前岁之初，亦有一遭，
同样之失，同样之泄。

当时亦是 Source Map，
亦是 npm 包，
亦是仓促收回，欲掩弥彰。

史家有言：
“历史不复其形，而复其韵。”

此之谓也。

⸻

是以议论分为二派。

一派曰：

“此不过应用层之泄，不伤筋骨。
基座模型仍闭源如初，
算力数据，犹在其手。
彼辈虽得其形，不得其神，
何足为患？”

此说颇为投资人所喜。

⸻

另一派曰：

“此言过轻。”

盖彼之所以名动一时，
不在模型，而在其 Agent 之术。

其 orchestration 之法，
其 tool 调度之序，
其 harness 之骨架，
皆为其立身之本。

而今一朝尽出，
几无所留。

此非伤皮肉，
乃及筋骨。

⸻

然史臣观之，又有第三意。

夫此等之术，本非不可见之秘。
用户观其行，可推其法；
工程试其路，可逼其近。

但其所需者，时日耳。

今一朝泄露，
不过缩半年一年之路，
为数日之程。

故此失，诚痛也；
然其势，亦难久守。

⸻

更有奇者，在其时也。

是时也，群雄议论，渐离模型之优劣，
而转问：“AI 何以干活？”

风向已变，
人心已动。

当此时，
全套“可运行之系统”，
忽然现于世。

如春雨骤至。

⸻

于是社区大兴。

fork 者如林，
复现者如雨。

open claw 之类，如虎添翼，枝蔓横生。

学者解其代码，
评者论其架构。

昔日之“不可言说”，
今为“逐行解读”。

⸻

而当事之邦，何如？

外人见其风光无两，
内人其冷暖自知。

投资者滴血，
创始者心痛。

工程之苦，积年之功，
一朝为人所窥。

然亦有慰者曰：

“彼所得，不过旧路；
我所行者，仍在前方。”

⸻

史臣终论曰：

此事有三义。

其一曰事故：
制度未严，流程有失，
非一人之罪。

其二曰献祭：
以一家之失，
换天下之悟。功莫大焉？

其三曰加速：
行业之路，骤然收敛；
众人之学，倍速而进。

⸻

至于功过，未可轻断。

或为罪，或为德，
或为祸，或为福。

然有一言，可为后世记：

天下之势，
不以一人之守为转移。呜呼喜哉，呜呼哀哉。

关于Claude Code 泄漏事件的愚人节思考

我们第一次近距离看清了一个前沿 coding agent 的完整“骨架”和细节

这几天 Claude Code 的 TypeScript 源码泄漏让整个 AI 圈都有一种很奇特的兴奋感。原因并不只是“顶流产品翻车了”这么简单更重要的是这次泄漏意外把一个真正跑在生产一线的 coding agent 体系结构摊在了公众面前。公开信息显示这次事故来自 @anthropic-ai/claude-code 的一个 npm 发布包其中误带了一个约 60MB 的 sourcemap 文件由此暴露出约 1900 个文件和 51.2 万行以上的 TypeScript 源码。Anthropic 对外确认这是发布打包过程中的人为失误不是入侵事件也没有客户数据或凭证泄漏。

这件事最值得研究的地方不在八卦也不在“谁会抄谁” 而在于它让我们第一次以接近解剖学的方式看到一个前沿 agent 产品到底是怎样从“大模型会写代码” 走到“能持续数十分钟甚至更久代表你执行长程软件任务”的。官方研究文章本身已经反复强调评价一个 agent 时不能只看模型本体还要看 harness 也就是那套负责处理输入组织工具调用管理上下文执行与返回结果的脚手架系统。Anthropic 甚至直接把 Claude Code 描述成一种 flexible agent harness。

换句话说这次泄漏真正暴露出来的不是“Claude 会不会写 TS” 这种无聊问题而是 agent 时代最关键的一层：模型之上的操作系统级执行框架。

一先把事实说清楚这次到底泄漏了什么

目前可以比较稳地确认三点。

第一这次泄漏的是 Claude Code 产品层源码不是底层大模型权重也不是训练配方。媒体和 Anthropic 的对外表态都很清楚这是 Claude Code 内部源码被误打进包里不是模型本身外流。

第二泄漏媒介不是传统意义上的“代码仓被黑” 而是一个 sourcemap 打包错误。BleepingComputer 报道指出被发布的 2.1.88 版本 npm 包里包含了 cli.js.map 从而让完整源代码可被还原。社区汇总仓库则给出更具体的数字约 59.8MB sourcemap 约 1900 个文件约 51.2 万行 TypeScript。

第三这次暴露的是一个已经相当成熟的工程系统不是一个“prompt 套壳 demo”。官方文档显示 Claude Code 早已覆盖终端 IDE 桌面 Web 等入口具备读写代码运行命令搜索网页编辑文件使用子代理 hooks 技能 memory 与权限模式等完整能力栈。官方还明确说 Claude Agent SDK 提供的就是“与 Claude Code 同样的 tools agent loop 和 context management”。

也正因为如此这次泄漏之所以震动行业不是因为“原来也用 TypeScript” 而是因为行业第一次比较完整地看到了一个 production-grade coding agent 的产品层装配方法。

二这次泄漏最有价值的部分不是功能彩蛋而是它证明了 agent 真正难的不是模型是 harness

很多人第一次看这类泄漏目光会被彩蛋吸走。比如 The Verge 报道提到社区在泄漏代码中发现了类似 Tamagotchi 的宠物功能以及一个名为 KAIROS 的 always-on agent 模式。社区镜像仓库也声称在 assistant/ 目录看到了被编译开关控制的 “KAIROS / PROACTIVE” 持续运行模式。这里我建议把兴奋值先降下来：这些特性至少目前都不是 Anthropic 官方公开发布并确认可用的产品能力只能说它们在泄漏代码镜像与媒体报道中被观察到。

真正重要的是这次泄漏把一个行业事实照得很亮：前沿 agent 的核心竞争力早已不只是模型答得好不好而是模型被放进怎样的一套执行系统中。 这和 Anthropic 自己过去一年多的公开表述完全一致。2024 年底他们就强调成功的 agent 系统通常不是靠复杂框架取胜而是靠简单可组合的模式；2025 年开始又进一步把重心从 prompt engineering 推向 context engineering 以及 harness design；到了 2026 年更是直接把长程应用开发能力的提升归因于 harness design。

说白了模型像大脑 harness 才是神经系统骨骼肌肉感觉器官和记忆器官的总成。没有这套总成再强的模型也很难可靠地长时间工作。

三 Claude Code 暴露出的其实是一套“agent 操作系统”雏形

如果把今天主流 agent 的工程栈往抽象层面提一层我觉得可以把它看成五层。

第一层是认知层。这里是大模型本体负责理解目标分解任务做局部推理判断何时该调用何种工具。Anthropic 对 agent 的定义也强调 agent 与 workflow 的差别在于前者由 LLM 动态地主导自己的过程与工具使用而不是由固定代码路径预先编排。

第二层是上下文层。它决定模型“此刻脑子里装了什么”。Anthropic 在 context engineering 一文里给出的说法很到位：工程问题已不再只是“怎么写 prompt” 而是“在有限 token 预算下该把什么状态送进模型以最大概率得到想要行为”。这里面不仅包括 system prompt 还包括工具说明 MCP 外部数据消息历史等。

第三层是能力层。也就是 tools skills subagents MCP servers hooks 这些可执行部件。官方文档明确说明 Claude Code 支持自定义 skills hooks 和 subagents 并且 SDK 将 Claude Code 的 tools agent loop 和 context management 暴露给开发者。

第四层是执行与安全层。Claude Code 的权限模式自动审批分类器沙箱文件系统隔离网络隔离都在这里。官方文档说明 auto mode 用一个后台分类器替代手工 permission prompt；而 2025 年的 sandboxing 文章则更进一步直接把文件系统隔离与网络隔离落到 OS level primitives 上如 Linux bubblewrap 与 macOS seatbelt。

第五层是持续性层。也就是 long-running task 最难的那部分：会话间记忆断点续跑状态压缩项目长期上下文以及人机交接。官方 memory 文档写得非常明确：每个 Claude Code 会话都从一个全新的上下文窗口开始但知识可以通过 CLAUDE.md 和 auto memory 跨会话带入 subagents 也可以有自己的 auto memory。Anthropic 关于 long-running harness 的文章进一步指出真正的难点是 agent 工作在离散 session 中而每个新 session 一开始都“不记得”前一班发生了什么。

把这五层放在一起你会发现它已经很像一个轻量操作系统了。模型并不是“应用” 它更像调度核心。工具不是普通 API 更像 device driver 和 syscalls。hooks 像中断和策略注入点。memory 像持久状态。sandbox 与权限模式像内核安全边界。subagents 则像用户态的并发 worker。

所以我现在越来越倾向于一个判断：agent 的未来不是“多一个聊天框” 而是“多一层执行操作系统”。

四从官方文档反推 Claude Code 的主干工作流其实已经非常清楚

虽然泄漏代码的很多细节还带着社区拆解的二手性质但就算完全不看泄漏只看 Anthropic 官方文档 Claude Code 的主干工作流也已经能拼起来。

它的起点是一个会话。用户以终端 IDE 或 Web 入口给出目标后 Claude Code 启动 agentic loop。这个 loop 的基本元素包括读取项目上下文载入 CLAUDE.md 载入 auto memory 解析可用工具或子代理然后在多轮推理中交替进行“思考读文件搜索编辑运行命令检查结果再修正”。官方 quickstart 和 SDK 文档都把这个 loop 描述为 Claude Code 的核心。

权限控制贯穿其中。官方 permission mode 文档给出几种模式：监督编辑只读计划以及 auto mode。auto mode 的关键思想很重要：不再靠用户在每个危险点手工点批准而是用后台分类器在不中断流程的情况下做判断。Anthropic 在 2026 年关于 auto mode 的文章中甚至强调 auto mode 的目标是替代 --dangerously-skip-permissions 而不是把人再拉回每一步审批。

同时安全不是只靠分类器嘴上判断。Anthropic 在 sandboxing 文章里说得很清楚真正能让 agent 少打扰人而又不至于失控的关键是 OS-level isolation。Claude Code 在其 sandboxed bash 模式下通过文件系统隔离限定可访问目录通过网络隔离限定可连接域名甚至所有子进程和脚本都会一起被约束。这意味着“少审批”并不是靠信任模型而是靠把模型关进一个跑不出边界的执行盒子里。

这个设计非常关键。因为 agent 的真正瓶颈一直不是“想不出来” 而是“你敢不敢让它持续做”。你一旦敢让它持续做它才有可能完成长程任务。而你之所以敢不是因为它变成圣人了而是因为你给它戴上了 harness。

五真正前沿的地方不是单轮工具调用而是长程任务中的状态管理

Anthropic 在 2025 年和 2026 年关于 long-running agents 的几篇文章其实已经把核心难题说透了。长程 agent 最大的问题从来不是单步做对而是跨 session 不走形。上下文窗口有限任务却可能持续几十分钟几小时甚至更多。如果没有状态桥接你每次恢复 agent 都像换了个失忆工程师接班。

这也是为什么我一直觉得现在很多人把“长上下文”吹得过于神奇。长上下文当然有用但它更像一块更大的工作台不是长期记忆本身。Claude Code 官方 memory 文档反而更诚实：每次 session 还是 fresh context 只是通过 CLAUDE.md 和 auto memory 把重要规则与 learnings 再装进来。并且 auto memory 还是有大小限制的例如每个 session 只自动载入前 200 行或 25KB 的 auto memory。

这意味着一个成熟 harness 至少要解决四个连续性问题。

第一任务状态要可序列化。做到哪一步了哪些测试跑过了哪些问题还没修哪些文件是脏的这些不能只存在模型当前脑子里。

第二过程要可压缩。你不能把整个对话历史原样永远背着走必须学会把历史提炼成面向后续行动的工作摘要。

第三规则要可注入。也就是 CLAUDE.md 这种长期可读的项目规则层。它不像 runtime transcript 那样短期易噪声而更像“团队作业守则”。

第四学习要可沉淀。auto memory 的意义就在这里。它不是项目文档也不是对话历史而是“Claude 自己学到的有效做法”。

长程 agent 真正的 sophistication 其实就藏在这四件事里。不是多会聊天而是多会接班。

六 subagents 的意义不在“多智能体炫技” 而在“把复杂任务拆成互不踩踏的工种”

现在圈里一提多智能体很多人要么神化要么嗤之以鼻。Anthropic 的公开经验反倒很务实。其 multi-agent research system 一文指出 orchestrator 真正难的是学会 delegation。主代理如果只给子代理一句含糊命令子代理就会重复劳动误解目标或者留下空白。为此每个子代理都需要清晰的 objective 输出格式工具与来源约束以及明确边界。

Claude Code 官方文档也表明它已有 built-in subagents 而且不同子代理有不同工具限制。比如 Explore 是只读快速检索型子代理主要做代码库发现和分析。父代理在 claude --agent 模式下可以通过 Agent 工具生成子代理。

这恰恰说明多智能体的价值从来不在“多开几个脑子看起来很酷” 而在于把一个大任务分配给不同工种。一个只读探索代理负责扫代码图谱一个计划代理负责列步骤一个通用代理负责具体执行人类则在关键节点做批准或重定向。它更像施工总包体系不像几个聊天机器人开圆桌会。

我甚至觉得未来最有效的 multi-agent 形态大概率都不是“平权群聊” 而是“有明显上下游边界的分层工种体系”。主代理像项目经理子代理像专业工人 hooks 像质检点 sandbox 像工地围栏 memory 像施工日志。

七这次泄漏让人真正震撼的一点是 Claude Code 已经非常“产品化”而非“研究原型”

很多人对 agent 还停留在这样一种印象：无非就是让模型自己调 API 然后循环几步。Claude Code 这次无论从官方文档还是从泄漏镜像暴露出的迹象看都已经远远超过这个阶段。

官方文档里你能看到的就包括 CLAUDE.md 规则体系 auto memory hooks subagents skills permission modes sandboxing Agent SDK 这些都不是研究演示用的一次性拼装件而是被文档化产品化可配置化的稳定模块。甚至 release notes 还暴露出更多“工程打磨痕迹” 例如 CLAUDE_CODE_NO_FLICKER=1 提供 flicker-free alt-screen rendering with virtualized scrollback PermissionDenied hook 允许 auto mode 分类器拒绝后让模型重试以及 named subagents 的类型提示。这些东西不性感但它们极其说明问题：这不是一个会写代码的聊天模型这是一个正在变成开发环境基础设施的系统。

官方关于 agent autonomy 的研究也给出了一个很有意思的事实：2025 年 9 月到 2026 年 1 月 Claude Code 交互会话里 99.9 分位的单 turn 工作时长从不到 25 分钟涨到 45 分钟以上；同时内部最难任务的成功率翻倍而平均人工干预次数从 5.4 次降到 3.3 次。经验更丰富的用户反而更愿意用自动批准但也更懂得在中途打断重定向。

这里面传递出的信号非常大：真正的 agent 产品不是让人彻底放手而是把监督方式从“每步点同意” 迁移到“让它先跑关键时刻我再介入”。这其实就是现代操作系统与自动化控制系统的发展逻辑：人不是从回路里消失而是从微观控制转向例外管理。

八泄漏代码让行业学到的第一课是 tool 不等于 API skill 也不等于 plugin

Anthropic 在 2025 年那篇《Writing effective tools for agents》里有一句我非常认同：工具是 deterministic systems 与 non-deterministic agents 之间的一种新契约。也就是说工具不再只是给程序员写的 API 它要写给会误解会走偏会选择策略的模型来用。

这句话看似简单但含金量极高。因为很多团队今天还在用“人类软件工程接口”的思维给 agent 造工具结果就是工具说明含糊输入输出不稳返回上下文不利于下一步推理 token 消耗还高。Anthropic 给出的原则很务实：工具应该有清晰边界合理命名返回有意义的上下文对 token 友好并且工具描述本身也要做 prompt engineering。

这直接导向一个更大的判断：未来 agent 生态真正的护城河很可能不是“谁家有多少 API” 而是“谁家有多少 agent-friendly capabilities”。同样一个能力做成给人调的 API 和做成给 agent 规划调用的 skill 完全不是一回事。前者重参数完整性与程序员心智一致性后者重意图可理解性错误可恢复性返回可继续推理性。

因此我一直觉得 app 时代的“插件”概念在 agent 时代会被重写。不是插件消失了而是它被更细粒度的能力单元替代。一个成熟的 skill 既要像 API 一样可靠又要像文档一样易懂还要像系统调用一样可审计可受限可回滚。

九 OS-level harness 才是 agent 规模化的真正门槛

这次看 Claude Code 泄漏很多人讨论的是“Anthropic 会不会被抄”。我觉得这事有但没那么大。真正难抄的不是“写个 terminal agent 界面” 甚至也不是“写一堆工具包装器” 真正难抄的是一整套 OS-level harness。

为什么这么说。因为只要 agent 要接触真实世界它就立刻会面对三组老问题。

第一组是权限边界问题。模型想做事就得访问文件命令网络浏览器 Git 凭据第三方服务。只要边界一模糊它就可能被 prompt injection 带偏或把不该带出的东西带出去。Anthropic 的做法是把文件系统和网络边界都下沉到 OS 级。

第二组是执行连续性问题。长程任务不可能永远在一个上下文窗口里完成必须会暂停恢复压缩续跑。Anthropic 直接把 long-running harness 拿出来单独讲可见这不是“锦上添花” 而是 agent 工程最核心的痛点之一。

第三组是监督方式问题。人类如果继续逐步点批准 agent 就永远跑不起来。可如果粗暴 dangerously-skip-permissions 又不现实。所以他们一边做 auto mode 分类器一边做 sandbox 一边做 hooks 允许企业或团队把策略外插。

把这三组问题放在一起你就会明白：agent 规模化的门槛根本不在聊天框在“有没有一个足够像操作系统的执行平面”。谁先把这层做扎实谁才有资格谈企业级长程任务。

十对整个 agent 行业来说这次泄漏最具启发性的宏观结论是什么

我觉得至少有五条。

第一领先 agent 产品的秘密已经不再主要藏在 prompt 里而藏在系统装配里。prompt 当然重要但它只是 context engineering 的一个子集。真正的竞争来自上下文组织能力装配安全边界和记忆续航。

第二 coding agent 已经在逼近“可持续工作者”而不是“即时问答器”。Anthropic 自己的公开数据表明交互工作时长在拉长人类干预方式在变化用户对 autonomy 的容忍与利用都在上升。

第三多智能体不是宗教问题是组织设计问题。主代理如何分工子代理如何限定边界工具如何面向 agent 设计这些比“是不是 multi-agent”本身重要得多。

第四安全不是事后加个审核按钮而是从 OS primitives 到 permission policy 到 proxy 到 hook policy 的分层体系。只靠模型“听话”不够只靠人“勤快审批”也不够。真正可扩展的解法是边界先行。

第五所谓 agent 时代的“操作系统机会”是真实存在的。不是指传统意义上重新做 Windows 或 macOS 而是指在应用和模型之间长出一层面向意图面向能力面向安全面向长期任务状态的中间系统。Claude Code 暴露出来的东西其实已经非常接近这层雏形。

十一对这次泄漏的技术判断既不用神化也别低估

先说不用神化。泄漏代码不等于泄漏护城河全部。模型权重没泄训练数据没泄线上运维经验没泄评测体系没泄团队持续迭代速度更没泄。更何况很多真正决定产品强度的东西可能本就不在某次 CLI 仓库里。Anthropic 自己长期强调的 harness evals context engineering 等能力其很大价值就来自持续调参与经验沉淀。

但也别低估。因为一个成熟产品层系统被如此大规模暴露对行业的示范价值是巨大的。竞争对手当然不能直接合法照搬但他们可以从架构取向模块划分权限边界 memory 组织子代理分工 UI/CLI 工作流等方面获得非常高密度的启发。Axios 与多家媒体也都指出泄漏暴露了未发布功能与架构细节相当于给竞争者送出了一张 blueprint。

更重要的是它会加速一个行业共识：agent 不是一个 prompt 技巧而是一套系统软件工程。

十二最后给一个更直白的结论

如果你把这次 Claude Code TypeScript 泄漏只看成一次尴尬的发布事故那就看小了。

它真正的历史意义在于让整个行业突然看见了这样一个事实：

今天最强的 agent 产品早就不是“一个会写代码的大模型”
而是“一个以大模型为核心以 context 为燃料以 tools 和 skills 为四肢以 subagents 为工种分化以 memory 为持续性以 sandbox 和 permission policy 为边界以 hooks 和 SDK 为扩展口的微型操作系统”。

大模型负责理解与决策
harness 负责让理解变成可持续可审计可中断可恢复的现实执行

真正会改变软件世界的不是模型会说话
而是模型终于长出了手脚但这些手脚不再直接裸奔
它们被一整套系统级 harness 驯化成了可以长期工作的“数字工程体”

这才是 Claude Code 泄漏最值得研究的地方。

What the Claude Code TypeScript Leak Really Revealed

A rare x-ray of a frontier coding agent—and why the real story is the harness, not the model

The accidental leak of Claude Code’s TypeScript source was instantly treated as a spectacle: a top-tier AI company shipping its own internals to the public by mistake, the community pouncing on the package within hours, mirrors spreading everywhere, and social media doing what social media always does when blood is in the water. But the real importance of the incident lies somewhere else.

For once, the industry got to peek behind the curtain of a production-grade coding agent—not the model weights, not the training data, not the secret sauce of pretraining, but something arguably more important for the next phase of AI systems: the product-layer machinery that turns a language model into a long-running, tool-using, semi-autonomous software worker.

Multiple outlets reported that the leak came from an npm release of @anthropic-ai/claude-code in which a large JavaScript sourcemap file was mistakenly included, allowing observers to reconstruct the original TypeScript source. Anthropic said the incident was caused by human error in packaging, not by a breach, and that no customer data or credentials were exposed. Reports consistently placed the exposed codebase at roughly 512,000 lines spanning around 1,900 files, enough to give outsiders a surprisingly detailed view of Claude Code’s architecture and internal product logic.

That distinction matters. This was not a model leak. It was not the release of frontier weights, and it did not suddenly flatten the underlying capability gap between labs. What leaked was the executable skeleton around the model: the code that manages context, orchestrates tool use, enforces permissions, carries state forward, and makes an agent viable over many steps instead of one. In other words, what leaked was not the “mind” of the system, but something closer to its nervous system, musculature, and operating discipline.

That is why the event matters far beyond Anthropic’s embarrassment. It exposed, in unusually concrete form, what the next competitive frontier in AI really looks like. The industry has spent the last two years obsessing over models. Increasingly, the harder problem is not how to make a model answer a question. It is how to make that model work for forty minutes, or four hours, across tools, files, commands, failures, interruptions, and handoffs, without collapsing into confusion or becoming unsafe. Anthropic’s own engineering writing has been moving in exactly this direction for months: away from prompt tricks, and toward context engineering, tool design, agent evaluation, sandboxing, and harness design for long-running tasks.

That shift is the real story.

The leak was interesting because it exposed a system, not a demo

There is a huge difference between an impressive AI demo and a productized agent. A demo shows that a model can do something once. A productized agent has to do it repeatedly, under constraints, with partial failures, ambiguous user intent, changing state, and real permissions. It has to survive success, survive error, and survive boredom. It has to keep working after the novelty wears off.

By the time this leak happened, Claude Code was already clearly far beyond the stage of “an LLM in a terminal.” Anthropic’s documentation and engineering posts describe a system with structured tools, context management, memory layers, subagents, hooks, permission modes, SDK support, and security controls designed specifically for real-world, iterative work. Anthropic has even described Claude Code as a flexible agent harness, which is a telling phrase: not just an assistant, not just a shell wrapper, but a runtime system for sustained model-driven execution.

That language is not cosmetic. It reflects a deep architectural truth. Once an AI system is expected to act rather than merely answer, the harness becomes first-class. The harness is what decides what enters the model’s context, what tools are exposed, what outputs are executable, how risk is bounded, how history is compressed, and how work resumes after interruption. The harness is what lets a model stop being a brilliant intern and start becoming a usable operator.

This is why the leak was so revealing. It made visible the fact that a frontier coding agent is not merely “LLM plus API calls.” It is a layered execution environment.

The architecture we should really be talking about

The cleanest way to understand what Claude Code appears to represent is as an early form of an agent operating system. Not an operating system in the old desktop sense, of course, but an execution layer sitting between human intent and the messy world of files, commands, network access, external tools, and long-lived work.

At the top sits the cognitive layer: the model itself. This is the part that interprets goals, plans steps, decides whether to inspect or edit, whether to run a command, whether to consult a tool, whether to delegate, whether to stop, and whether to revise a previous approach. Anthropic’s own framing of agents is useful here: unlike fixed workflows, agents are systems in which the LLM dynamically directs its own process and tool usage.

Beneath that is the context layer, which is far more important than most people realized during the first wave of prompt engineering. Anthropic’s context engineering work defines the problem as curating and maintaining the optimal set of tokens during inference—not just a prompt, but everything that lands in the model’s window: system instructions, conversation history, tool schemas, retrieved state, memory summaries, and external context. The point is not verbosity. The point is getting the right state into the right place at the right time, while staying within budget.

Then comes the capability layer: tools, skills, subagents, MCP-connected services, hooks, code execution, and the interface contracts through which the model can do real work. Anthropic’s engineering guidance on tools is blunt and correct: tools are the contract between deterministic systems and nondeterministic agents, which means they cannot be designed as if the caller were always a careful human programmer. They must be understandable to the model, robust to ambiguity, and economical in how they return usable context for the next reasoning step.

Below that sits the execution and safety layer. This is where many agent demos quietly die when exposed to reality. If the system can read files, edit code, run shell commands, browse networks, and touch external services, then it needs enforcement—not vibes, not promises, but hard boundaries. Anthropic’s sandboxing work makes this point clearly: if you want to reduce user interruption without inviting disaster, you need OS-level controls such as filesystem isolation and network restriction. In their write-up, the emphasis is not on polite model behavior but on containment via operating-system primitives. That is exactly the right instinct.

Finally, there is the continuity layer: everything needed for long-running work to remain coherent across time. This is where “chatbot thinking” breaks down. Long tasks span multiple context windows. They pause, resume, compress, branch, and sometimes recover after failure. Anthropic’s engineering writing on long-running agents explicitly calls out this challenge: an agent can do good work inside a single context window, but making consistent progress across many such windows is still an open systems problem.

Put those layers together and the picture becomes clear. A serious agent is no longer just a model. It is a control plane.

Why the most important word here is “harness”

“Harness” may sound like humble engineering terminology, but it is quickly becoming one of the defining words of the agent era.

A harness is the difference between a clever system and a dependable one. It is what transforms a raw generative model into a bounded actor that can perceive, plan, act, recover, and continue. The model reasons. The harness operationalizes that reasoning.

This is not a semantic distinction. It is the central engineering challenge of the field. Anthropic has been unusually explicit about this in its public writing. Their posts on long-running agents, tool design, multi-agent research, and agent evaluation all converge on the same principle: if you want real-world agentic performance, you must stop treating the model in isolation. Evaluation must include the transcript and the outcome. Tool interfaces must be engineered for model use. Context must be curated rather than dumped. State must be compressed across sessions. Autonomy must be mediated by permissions and environment controls.

That is what the leak inadvertently dramatized. The exposed code appears to have fascinated people not because it contained mystical prompts, but because it showed the accumulated scaffolding required to make an agent actually run. Even media coverage of more playful findings—such as references to a Tamagotchi-style pet or an internal “KAIROS” mode suggestive of a more always-on agent behavior—was interesting mainly because it hinted at a system that was already far more productized and exploratory than a public CLI façade would suggest. Those features were reported from code analysis and media review, not from official feature launches, so they should be treated cautiously. But even as signals, they reinforce the broader point: the product surface is only the visible edge of a much deeper execution architecture.

Long-running tasks are where the romance ends and the engineering begins

The industry has become very good at showcasing one-shot intelligence. Ask a hard question, get a sharp answer. Request a file edit, receive a plausible patch. That is the easy part, or at least the easier part.

The much harder problem is longitudinal coherence. Can the system stay useful after thirty tool calls? Can it remember what it already verified? Can it summarize its own work productively rather than dragging a giant transcript forever? Can it stop repeating failed actions? Can it resume from a checkpoint without becoming a different personality with amnesia? Can it work under constrained permissions without constant babysitting?

This is where modern agents either become infrastructure or stay toys.

Anthropic’s public materials make clear that Claude Code tackles this not by pretending every session is one endless conversation, but by treating continuity as a separate engineering concern. Their documentation around memory shows that sessions begin with fresh context windows, while persistent project knowledge can be reintroduced through artifacts such as CLAUDE.md and auto-loaded memory. That is a subtle but important design choice. It rejects the fantasy that bigger windows alone solve persistence. Instead, it treats persistence as a state-management problem: what should be carried forward, in what form, and at what granularity.

That design instinct is more profound than it may first appear. Long context is not memory in the full systems sense. It is a larger desk, not a durable institutional mind. Real memory for agents has at least three distinct forms.

One is task state: what has already been done, what remains open, and what the current frontier of work is. Another is policy memory: the rules, conventions, and preferences that should shape behavior across sessions. A third is experiential memory: what approaches worked, what failed, and what patterns the system should prefer next time.

The harness has to decide how these are stored, when they are retrieved, and how they are compressed so they remain useful instead of becoming token sludge. That is not the model’s “natural intelligence.” That is systems engineering.

Tools are not APIs anymore—at least not in the old sense

One of the most consequential implications of this leak is what it says about the future of software interfaces.

For the app era, APIs were built mainly for programmers. They assumed explicit calls, disciplined arguments, deterministic control flow, and external orchestration. In the agent era, that is no longer enough. The caller is often a probabilistic planner operating through language and partial context. It may misunderstand boundaries, misuse a tool, or invoke the right capability at the wrong moment. The interface therefore has to be legible not just to humans, but to models.

Anthropic’s guidance on writing effective tools for agents makes exactly this point. Tools should have clear names, clear boundaries, concise but informative descriptions, and outputs that help the model make the next decision rather than merely dumping raw data. This is more than documentation polish. It is a new interface discipline.

That is why I increasingly think the old vocabulary—API, plugin, extension—does not quite capture what is emerging. A high-quality agent skill is not just a wrapped endpoint. It is an executable capability unit designed for model planning, model invocation, error recovery, policy enforcement, observability, and often token efficiency. It is closer to a syscall with documentation, guardrails, and telemetry than to a classic web API.

This is also why capability density may matter more than raw model parity in the next competitive phase. Once leading models are all reasonably capable, the decisive difference may be the richness and quality of the harnessed capability environment: how many reliable skills exist, how composable they are, how well they are described, how safely they execute, how efficiently they pass context, and how well they integrate into longer task loops.

In that world, the ecosystem moat shifts upward. The battle is no longer only about who has the smartest model. It is also about who has the most usable action surface.

Multi-agent systems only matter if they improve division of labor

The leak also adds fuel to another active debate: whether multi-agent architectures are genuinely useful or just elaborate theater.

Here again, Anthropic’s public engineering perspective is more sober than much of the discourse. In its write-up on the company’s multi-agent research system, the key challenge is not “more agents equals more intelligence.” It is delegation. The orchestrator must know when to hand work off, how to specify the task, how to constrain the subagent, and how to turn partial results into progress without wasting effort or creating contradictory work streams.

That is the right framing. Multi-agent systems make sense when they create cleaner division of labor. A read-only exploration agent can map the repository. A planning agent can structure the work. An execution agent can edit and run tests. A verification layer can judge outputs. A human can step in only at leverage points. This is not “a bunch of bots chatting.” It is a labor system.

Seen that way, subagents are not an indulgence. They are the first signs of specialization inside AI runtime environments. Once tasks become large enough, one generalized process becomes clumsy. You want bounded workers, each with specific tools, scopes, and expected outputs. That is not unlike how modern computing systems evolved from single-process simplicity to structured concurrency and process isolation.

The lesson is simple: multi-agent is not a religion. It is organization design.

Safety, in practice, means the model does not get to be trusted by default

One of the deeper ironies of the Claude Code leak is that it hit a company whose public identity is heavily tied to safety. That irony wrote itself on social media. But the more interesting observation is technical.

When people say “AI safety,” many still imagine abstract alignment discourse or content filtering. Yet in real agent systems, a huge fraction of practical safety is operational: what can the agent access, what can it execute, what network paths are open, what approvals are required, and how exceptions are handled when the model confidently heads in the wrong direction.

Anthropic’s engineering material on sandboxing and permissions points toward a mature answer. Permissions alone are not enough if they require the human to approve every move. That destroys flow and keeps the system from becoming truly useful. But letting the model run without constraints is equally untenable. The way forward is layered enforcement: policy classifiers, execution sandboxes, file and network boundaries, and extension points such as hooks where custom organizational policies can be injected.

That is a fundamentally important design philosophy. It says that reduced human interruption should come not from blind trust in the model, but from stronger environmental guarantees around it. In other words, you do not make autonomy safe by teaching the tiger manners. You make it safe by building the enclosure properly.

This is also where the phrase “OS-level harness” becomes more than metaphor. Once agent systems interact with the real world, they start inheriting the old truths of operating systems and security engineering: privilege separation matters, isolation matters, explicit boundaries matter, auditability matters, and resumability matters. The romance of “AI that just figures it out” runs into the granite of systems design.

What the industry should learn from this moment

It would be easy to reduce the whole affair to a cautionary tale about release engineering, and it certainly is that. A misconfigured packaging process or an overlooked sourcemap can expose an extraordinary amount of internal detail. The operational lesson is obvious and a bit humiliating: modern AI companies, no matter how sophisticated, are still software companies, and software companies can still trip over the oldest rake in the yard.

But that would be the shallow lesson.

The deeper lesson is that frontier agent systems are now being built as full-stack execution environments. The model is still central, but it is no longer the whole product. Context curation, memory persistence, tool ergonomics, task orchestration, sandboxing, permissions, subagent specialization, evaluation methodology, and session-to-session continuity are all becoming part of the competitive core. Anthropic’s public work has effectively been spelling this out for over a year; the leak merely made the abstract thesis concrete.

That is why this incident will likely matter more as a strategic signal than as a one-off embarrassment. Competitors did not gain model weights, but they gained something almost as valuable for the near term: a sharper picture of how one of the leading coding agents is assembled into a production system. Even if no one can simply clone the whole thing, the leak accelerates convergence around architecture patterns. It teaches by exposure.

And perhaps most importantly, it nudges the broader AI conversation toward the right level of abstraction. The real frontier is no longer just intelligence in the narrow sense. It is controlled, sustained, economically useful agency.

The bigger picture: agents are becoming a new execution layer for software

If there is one conclusion worth carrying forward, it is this:

The future of agents is not “a better chatbot.” It is a new execution layer between human intent and software reality.

In the app era, users navigated menus, forms, dashboards, tabs, and icons. In the API era, developers stitched services together manually. In the agent era, the user increasingly declares intent, and a model-centered runtime translates that intent into a sequence of bounded actions across tools, files, services, and state.

That runtime needs memory. It needs policy. It needs permissions. It needs a tool contract. It needs recovery logic. It needs evaluation. It needs observability. It needs all the dull, durable things that software needs when it stops being a trick and starts becoming infrastructure.

Claude Code, as glimpsed through this leak and through Anthropic’s own public architecture writing, looks less and less like “an assistant that can code” and more like an early agent operating environment for software work. That is why the leak was so revealing. It showed that behind the glamour of modern AI lies a quieter but far more consequential truth:

The model may provide the intelligence, but the harness provides the agency.

And in the long run, agency is where the real systems battle will be fought.

When Agents Become the Default Gateway, Will the Operating System Be Rewritten?

The answer isn’t “will it happen?” It’s already happening. Just not in the way we’re used to.

The Operating System in the Agentic AI Era

I. The history of operating systems is, at its core, a war over the front door

Each generation of operating systems didn’t merely improve kernels. It reorganized the entry point—how humans express intent.

DOS: the command line was the entry point.
Windows / macOS: the desktop GUI became the entry point.
iOS / Android: app icons became the entry point.
The web era: the browser became the entry point.

The strategic heart of an operating system has never been the kernel. It’s the question: how does a user make something happen?

Change the front door, and the entire software ecosystem gets reshuffled.

II. Agents change the way intent is expressed

In the old model, doing something looked like this:

You want something done → open an app → find the feature → click through the workflow.

In the agentic model, the loop becomes:

You want something done → tell an agent → the agent orchestrates the system.

This is not a feature upgrade. It’s the disappearance of the old entry point. Recent “OS-level agent” moments—whether you look at stunning phone demos like Doubao’s, or the grassroots explosion around OpenClaw—make one thing unusually vivid: when users stop opening apps and agents start calling them, apps stop being the front door. They become capability modules.

In that world, the operating system is no longer organized around an “app launcher.” It’s organized around a permission orchestrator.

That is the structural change.

III. When the agent becomes the default entry point, three things happen to the OS

3.1 UI moves to the second row

The UI doesn’t disappear, but it stops being the center of gravity. The interface becomes a governance tool, not an operation tool. It naturally splits into three roles:

a visualization layer
an approval layer
an audit layer

The real execution logic lives in the background orchestration layer. Icons shrink in importance. Menus fade. “Workflows” get flattened.

(1) Visualization layer
In traditional software, the UI is a control panel: you press buttons to cause actions.

In the agent era, actions happen in the background. The UI’s primary job is to tell you what happened:

what the agent plans to do
what it is doing right now
what it has completed

If the agent books your flights, reorganizes your files, refactors your code, or runs a batch of API calls, you don’t click through each step. You supervise the plan and the outcome. The UI becomes closer to an aircraft instrument panel than a steering wheel.

(2) Approval layer
This layer becomes critical the moment agents gain execution authority. Some actions must require explicit human confirmation:

deleting 2,000 files
wiring $5,000
signing a contract
sending sensitive data outside the organization

Now the UI isn’t a collection of “features.” It’s a set of risk checkpoints. Its core function is not “click to do,” but “authorize or deny.”

It must show:

risk level
blast radius
confirm / reject controls

This is the UI as the human’s final vote.

(3) Audit layer
If an agent can execute continuously, you can’t watch every step. The OS must surface accountability:

execution logs
tool and API call traces
permission usage history
resource consumption (tokens, API spend, data egress)
anomaly alerts

This looks less like a classic app UI and more like:

a bank statement
a cloud access log
a flight recorder

The UI becomes an interface for responsibility. It doesn’t help you “do the work.” It helps you know what happened—and assign blame when something goes wrong.

Put side by side, the shift is stark.

Traditional app UI:
menus, buttons, forms, step-by-step workflows

Agent-era UI:
plans, summaries, risk prompts, permission grants, audit trails

You are no longer the operator. You are the supervisor.

And that’s not just an interaction change—it’s philosophical.

Before: humans operate; software executes.
After: agents operate; humans arbitrate.

So the UI naturally migrates toward feedback, authorization, and oversight.

A concrete example
Imagine a future macOS where you say:

“Turn last year’s client invoices into a financial report.”

The agent quietly:

searches files
extracts data
calls spreadsheet tooling
uses email APIs if needed
generates a PDF

And the UI shows only:

a plan of steps
a warning: 3 anomalous files detected
a lock: authorize access to the finance folder?
a result: report generated

You didn’t “open” any app. You supervised. The UI didn’t vanish—it evolved from a control panel into a responsibility panel. And whoever controls that panel controls the final decision.

That is what the OS must defend.

3.2 The permission system becomes the core asset

Classic OS security models are built around:

file permissions
process isolation
sandboxing

But the agent era demands something more dynamic:

just-in-time permission grants
temporary execution authorization
revocable capability interfaces
verifiable execution logs

The OS shifts from a resource management system into a governance system for delegated execution.

3.3 APIs rise; apps fade

When agents are the default gateway, UI value goes down and API value goes up. The ecosystem starts to look like:

foreground: one “super agent”
background: countless capability interfaces

In that world, the App Store itself may morph—from an “app market” into a “skill market.” Users don’t download apps; agents call capabilities. Distribution is rewritten.

IV. Why big platforms don’t fully open the gates

Because once an agent becomes the default entry point:

OS vendors lose the privileged control that UI once provided
the app ecosystem gets abstracted into a capability layer
revenue models face renegotiation

If every iPhone app becomes a background capability and the user interacts primarily through an agent, do app icons still matter? Does the 30% toll still feel defensible?

Entry-point control is profit control. That is why platform players ship agent features cautiously and incrementally.

When a product like Doubao pushes toward OS-level agency and triggers visible pushback, it’s not mysterious what it threatens. But the direction is hard to reverse: once consumers taste the productivity of an OS-level agent, they rarely want to go back to tapping through menus.

V. OpenClaw is a preview of an “ungoverned OS”

OpenClaw is, in essence, a simplified shell of an agent operating system.

It lacks mature permission governance. It lacks compliance frameworks. It lacks serious auditing. And yet it demonstrates a key fact:

model + permission orchestration + local execution is already enough to simulate a micro-OS.

That is why it shocks people. Not because it invented new intelligence, but because it shows what happens when you attach intelligence to execution without governance.

VI. The real future shape

When agents become the default gateway, the operating system becomes:

a permission allocation platform
an execution-log platform
a capability marketplace
a risk-control hub

UI gets simpler. Apps become invisible. Capabilities become modular.

The user sees a conversational entry point. Underneath is a governance engine for delegated action.

VII. Final judgement

Agents will not eliminate operating systems. They will force operating systems to evolve—from “resource schedulers” into “arbiters of delegated execution.”

The core asset in the agent era is the power to define boundaries:

what can be done
by whom
under what permissions
with what logs and accountability

Whoever defines those boundaries becomes the next platform.

https://liweinlp.com/category/english

When Agents Become the Default Gateway, Does the App Store Model Collapse?

My answer: not immediately. But its structural profits will be quietly, steadily eroded—and the way it happens is subtle enough that many people won’t notice until the numbers start to move.

In the mobile era, we got used to a simple truth: if you control the home screen, you control the money. The App Store was never just a software catalog. It was a tollbooth placed at the one place users had to pass through.

That premise is what the Agent era challenges.

I. The App Store Doesn’t Really Sell Apps—It Sells Gatekeeping
The App Store’s core asset has never been “distribution” in the neutral, technical sense. Distribution is a commodity now. What the App Store truly owns is the gate:

the default user entry point
the power to route attention and traffic
control of the payment rail
the right to tax the ecosystem

In the classic mobile loop, the sequence looks like this:

user → opens an app → uses a service
platform controls the entry point → takes ~30%

That structure works for one reason: the user must consciously open the app. As long as the app icon is the front door, the platform owns the doorframe—and can charge rent.

II. The Fatal Change in the Agent Era: Apps Stop Being the Entry Point
Once an agent becomes the default gateway, the flow changes into something like:

user → tells the agent → agent dispatches capabilities → calls an app’s backend APIs

The key shift is psychological as much as architectural: the user no longer “opens an app.” The app becomes a background capability provider.

And when the user can’t even tell which app is being used, two things happen at once:

brand gravity weakens
entry-point value decays

Traffic follows the new front door. Whoever controls the agent increasingly controls attention and intent. And that is the App Store’s structural threat in one sentence.

III. The App Store Won’t Disappear—But It Can Be Hollowed Out
This won’t look like a dramatic collapse. It will look like slow “hollowing,” where the storefront still exists, but its economic center of gravity shifts. Three changes are likely.

First: fewer UI-heavy apps.
A large class of utility apps—especially those built around routine workflows—will be absorbed into agent behavior:

calendar coordination
lightweight editing
information aggregation
copy-and-paste data movement

These become invisible background functions. Users may not know which product is powering the result, and they won’t care—until someone asks who gets paid.

Second: the commission logic gets challenged.
If an agent can complete a purchase by calling a cloud API directly—without going through an in-app purchase flow—the traditional platform toll lane can be bypassed.

The 30% model works best when the platform owns the transaction surface. Agents, by design, prefer capability surfaces: web APIs, service endpoints, programmable commerce. That route is harder to tax.

Third: a “skills market” starts to replace an “apps market.”
It’s not hard to imagine an ecosystem that looks more like:

agent skill marketplaces
capability modules / plugins as tradable units
API ecosystems designed for agent orchestration

In that world, the store doesn’t vanish. It mutates. It stops selling “apps” as user-facing products and starts selling “capabilities” as agent-callable services. That’s a form shift—not an extinction event.

IV. The Real Conflict Isn’t the App Store—It’s Who Owns the Default Agent
The strategic question is not whether an App Store survives. The strategic question is: who becomes the default agent?

If it’s Apple’s agent, the App Store is absorbed and reinterpreted inside a new orchestration layer.

If it’s an OpenAI/Anthropic-style agent, the platform can be partially bypassed—relegated to infrastructure while value capture migrates elsewhere.

If it’s a local, open-source agent (think OpenClaw-like trajectories), then platform rent extraction weakens: the platform remains in the chain, but with far less bargaining power.

Once entry-point control shifts, profit follows. This is the true reason platforms are anxious. It’s not a debate about UX. It’s a battle over who owns the choke point.

V. Why Big Platforms Move So Carefully on Agents
This is why the largest platforms push agents with visible caution. They are walking a tightrope.

If their agent is too strong:

users open fewer apps
platform commission pressure increases
developer economics get restructured

If their agent is too weak:

users migrate to third-party agents
entry-point control gets stolen
the platform becomes a “hardware shell” around someone else’s brain

It’s a delicate game. The likely strategy is not “build an agent that replaces apps,” but “build an agent that strengthens the existing ecosystem while preventing displacement.”

Agents won’t directly destroy the App Store. But they can demote it—from an entry-point platform into a capability supply market.

Entry-point value compresses. Profit formulas get rewritten. And the ultimate winner is not the party who sells apps, but the party who defines the orchestration rules.

VI. The Final Question
The mobile internet era rewarded whoever controlled the entry point.

The agent era will reward whoever controls intent interpretation and execution scheduling.

When a user says just one sentence—“get this done for me”—the person (or system) deciding where the request gets routed is the one deciding where the money flows.

At that moment, the most valuable asset is no longer the app icon on the home screen.

It’s the agent in the background doing the dispatch.

https://liweinlp.com/category/english

The Great Software Shake-Up of the Agent Era — Starting with OpenClaw

I. OpenClaw is a structural event.

What makes OpenClaw shocking isn’t a new algorithm. It’s the fact that it exposes a new reality:

LLM capability + local execution privileges + open-source scaffolding is already enough to rewrite how software gets produced.

When a solo developer can stitch together an agent with something close to “OS-level permissions” using off-the-shelf models and open frameworks, it tells us something uncomfortable yet important: raw capability is no longer scarce. The scarce variable is now composability—the ability to combine tools, permissions, and workflows into outcomes.

And composability isn’t linear. It’s exponential. When your building blocks are callable functions, “more blocks” doesn’t add—it multiplies.

II. Why “80% of software” gets swallowed

Once agents can:

understand natural language intent directly,
break a task into steps automatically,
call tools dynamically,
and correct their own execution paths in real time,

a huge category of “workflow-frozen software” starts losing value fast.

For decades, software has trained humans to adapt to software. You open the right app, learn the right menu tree, follow the prescribed workflow, and hope your problem fits the box. The agent era flips the direction: software adapts to human intent.

That shift has a brutal implication: the core of software stops being UI, menus, and fixed workflows. The core becomes APIs and capability interfaces. Everything in the middle—the layers whose main job is turning workflows into a clickable experience—gets compressed.

Many products won’t “die.” They’ll be absorbed.

Tools that are mostly UI-wrapped procedures.
SaaS products that are largely data shuttling.
Systems whose main value is rigid rule execution.

Agents don’t need to replace them by competing head-on. They can simply embed them as invisible steps in an orchestration graph.

III. The moat is moving

Traditional software moats looked like:

complex feature depth,
data lock-in,
sticky workflows,
custom enterprise integrations.

But in an agent world, features can be composed on demand, workflows can be generated dynamically, and data can be surfaced through standardized interfaces. The moat migrates to things that are harder to synthesize by “tool composition” alone:

high-quality proprietary data assets,
specialized vertical knowledge,
security, compliance, and governance maturity.

Put bluntly: software shifts from selling features to selling capability access and safe execution.

In the agent era, the winning product is less “a beautiful UI” and more “a reliable interface to real power—with guardrails.”

IV. Startups are being rewired

The classic playbook for software startups was familiar:

pick a scenario,
build a product,
polish the UX,
retain users,
scale subscriptions.

The agent-era playbook is different:

pick a high-value capability domain,
expose it as an agent-callable interface,
integrate into the skills ecosystem,
create value through execution, not clicks.

Entrepreneurship shifts from “building an app” to building callable capability modules.

In a world where agents orchestrate work, owning the right tool interface is like owning a critical interchange on a highway system. You don’t need to be the entire city. You just need to sit on the route everything passes through.

V. Investment logic is being repriced

Investors used to ask:

How many users do you have?
What’s your ARR?
What’s your SaaS retention?

Increasingly, the questions will mutate into:

Can your capability be orchestrated by agents?
Do you control a defensible data interface?
Is your execution verifiably safe—auditable, permissioned, compliant?

Valuation logic will follow. Pure “feature SaaS” gets pressured. Execution infrastructure and governance layers get rewarded.

Because in the agent era, the truly expensive asset isn’t UI. It’s the right to execute—safely.

VI. Local agents are a transitional form

OpenClaw’s explosion also reveals something practical: demand for action-oriented AI is already there. People don’t just want a model that answers. They want a system that does things.

But local deployment is likely a bridge, not the destination. At scale—especially in enterprises—agents will converge toward:

cloud integration,
enterprise-grade governance,
least-privilege architectures,
compliance and audit systems.

Individuals can unlock power by removing constraints. The commercial world must do the opposite: it has to constrain power before it ships.

The long-term winners won’t be those most willing to grant authority. They’ll be those best at granting authority safely.

VII. Software won’t disappear. It will become invisible.

OpenClaw’s creator suggested that “maybe 80% of software will lose its value.” The number may be rhetorically inflated. But the direction is right.

Software doesn’t vanish. It goes dark.

Users stop operating software directly. Agents operate software on their behalf. Products shift from foreground experiences to background capability modules.

That’s not a collapse. It’s an industrial migration.

VIII. The real watershed isn’t OpenClaw. It’s what it forces us to talk about next.

OpenClaw isn’t the endpoint. It’s the first public, living demonstration of something many suspected:

LLMs are already capable of executing real-world tasks—if you give them the keys.

For the past two years, the mainstream conversation was “intelligence augmentation.” In the next few years, the dominant conversation will be delegated execution:

Who sets the boundaries of capability?
Who defines execution permissions?
Who bears responsibility when things go wrong?

Those questions—more than model size or benchmark scores—may determine where the next generation of tech giants comes from.

Closing

The significance of OpenClaw isn’t what it did. It’s what it made obvious:

the software era is ending, and the capability era is beginning.

And in the capability era, what’s truly scarce isn’t the model. It’s controllable execution power.

Authority and safety are natural enemies. The biggest winners will be the ones who can make them coexist—without pretending the tension isn’t real.

https://liweinlp.com/category/english

Some Basic Agentic AI terminology

In the Agent era, the most common confusion is not technical — it’s architectural. We keep mixing abstraction layers, and then we end up debating terms that were never meant to be equivalent:

Is a plugin basically an app?
Is a special agent the new app?
What’s the difference between an API and a skill?
Is a general agent a tool, or a platform?

If we don’t separate layers, these questions will keep looping forever. So here’s a clean mental model: a six-layer stack from intent down to execution.

Human Intent → General Agent → Special Agent → Skill → Plugin → API

General Agent is the default entry point and the scheduler. It interprets natural language goals, decomposes complex tasks, decides which specialists to call, determines which capabilities to invoke, sequences execution, and manages permissions. Structurally, it resembles what browsers were in the web era, what desktop operating systems were in the PC era, and what iOS SpringBoard was in mobile: the “front door” where intent is translated into actions. It is not necessarily a specialist — it is the orchestrator.

Special Agents are domain experts: coding agents, math agents, legal agents, research agents, trading agents, and so on. Functionally, they look like “apps” because they are optimized around a task domain — specific knowledge, specific toolchains, and domain-specific execution strategies. But structurally, they are no longer the entry point. In the agent era, the entry point is owned by the General Agent.

Apps belong to the mobile-era abstraction. The traditional loop is user-driven: the user opens an app, navigates a UI, and triggers actions. In the agent era, the loop becomes orchestration-driven: the user expresses intent once, the General Agent dispatches the work, specialists and tools execute, and the result returns. Apps won’t disappear overnight, but many will lose their role as the primary interface. Some will degrade into background capabilities; others will survive as “special agents with UI.”

Then come the lower layers that people often collapse into one.

Skills are capability declarations — a semantic contract the model can understand. They describe what can be done, which parameters are required, what outputs are produced, and which permissions are needed. Skills live in the language layer; they don’t execute code. They exist so the model can plan.

Plugins are execution wrappers — the part that actually runs. They encapsulate API calls or local system access, handle authentication and permissions, manage errors, and return structured results. If skills are “what can be done,” plugins are “how it gets done.”

APIs are the lowest-level interfaces — the protocol surface that exposes underlying systems as callable endpoints. APIs do not think, decide, plan, or schedule. They are passive responders. If you like metaphors: electricity is the capability; the API is the wall socket.

So who is the “new app”?

From a task-function perspective: Special Agent ≈ the new app.
From an entry-point perspective: General Agent ≈ the new operating system.
From an execution-unit perspective: Plugin ≈ the new software primitive.

In other words, the mobile-era “app” is being decomposed into entry-point control, capability interfaces, and execution wrappers. The most strategic control point shifts to orchestration: whoever controls the General Agent controls the new default entry point.

Finally, a quick note on industry evolution. Early agent architectures were plugin-first: LLM + plugins = an executor. OpenAI even explored a “plugin store” storyline, reminiscent of app stores. The reason that pattern didn’t become the dominant ecosystem isn’t that plugins are useless. It’s that plugins are dangerous: they hold real privileges, and in an agent loop they can be triggered automatically, not necessarily by a human click. Discovery and scheduling are also harder when the “buyer” is a model. Most importantly, plugins expand what can be done — but the harder bottleneck is deciding what should be done, in what order, under what constraints.

That is why skills emerged as a lighter semantic layer, and why modern architectures insert governance and orchestration between the model and execution. Plugins didn’t disappear; they moved downward in the stack.

This isn’t “plugins failed.” It’s the software unit migrating. The new game is not only capability — it’s orchestration.

https://liweinlp.com/category/english

OpenClaw as a case study of the coming Agentic AI era

The agent era just hit a visible inflection point, and OpenClaw is a useful (and slightly terrifying) case study.

What’s striking about OpenClaw is not a technical breakthrough. It didn’t train a new model. It didn’t propose a new reasoning mechanism. It didn’t “beat” scaling laws.

It did something simpler—and far more consequential: it connected an already-strong LLM to real-world execution privileges.

Browser control. Filesystem access. Shell execution. API orchestration.

The model always had the “brain.” What changed is that we finally handed it the “keys.”

That’s why OpenClaw feels like a capability explosion. The intelligence didn’t suddenly appear; it was already there. We just didn’t dare to give it OS-level agency. OpenClaw shows us, in a vivid and unfiltered way, what happens when we do.

There’s also a psychological accelerant here: local deployment.

When something runs on your own machine, it creates a strong sense of sovereignty—“my process, my disk, I can kill it anytime, worst case I pull the plug.” That physical sense of control is real, but the safety inference often isn’t.

Local deployment improves visibility and the feeling of controllability. It does not automatically reduce the attack surface. Prompt injection doesn’t disappear because the agent is local. Permission creep doesn’t shrink because the hardware sits on your desk. Visibility can create calm; calm can be mistaken for security. That “controllability illusion” is arguably a major reason agentic systems are suddenly easier for people to accept.

The deeper reason this moment feels explosive, though, is composition.

In the traditional software world, capability composition is slow and human-driven—projects, teams, tickets, code, deployment, an entire lifecycle of a software development and deployment. In the “LLM + skills” world, composition becomes real-time, automated, and continuous. An agent can run 24/7, try pathways, fail, self-correct, and recombine tools endlessly. When capabilities are modular functions or skills, combinatorics becomes the growth engine. Explosion is not a metaphor; it’s the natural math of composition. Hence the explosion.

It’s also telling that an open-source / individual-driven project became the flashpoint. Large companies have strong reasons not to grant OS-level permissions lightly: legal liability, brand risk, regulatory pressure, and security maturity constraints. Individuals and small teams have fewer brakes. With fewer constraints, capabilities surface faster, making it a clearer window into the future agent world.

All of this reframes the real safety problem.

LLMs are the brain. Agents are the hands.

The brain-safety conversation has been loud for two years. The hand-safety conversation is just beginning, a much riskier and more challenging one. A wrong answer is frustrating. A wrong action can be irreversible. Killing a process isn’t governance. Pulling the plug isn’t governance. Governance means boundary verification and least-privilege execution designed into the architecture, not added as a last-minute guardrail.

We may still debate whether “AGI” is here. But one thing is already clear: we’ve entered the era of automated action. 2025-2026 marks the phase transition from generative AI era into agentic AI. The central challenge now is not purely technical—it’s designing a workable balance between delegated power and embedded safety, before the diffusion of OS-level agency outpaces the diffusion of governance.

Agent 时代的临界点：谈谈 OpenClaw 的安全隐患

立委按：OpenClaw 这个“春节小龙虾”的爆火，非常现象级。本来应该是极客社区的玩闹，结果引发整个产业的热闹。外网内网，几乎无人不谈。agent 为什么要借助它才火、才被看见？根子是本地部署给人一种安全感，但可能是一种虚假的安全，一种“可控幻觉”。open source 的这个 openclaw agent framework 里面几乎没有任何安全防护。现在看到的 openclaw 无所不能，只能算agent 潜力在没有顾忌的理想世界里的活生生的展示。一个个人开发者，用现有模型和开源框架，就能拼出这种级别的 Agent，说明了什么？说明了，“核武器”似乎开始了民间扩散的迹象。OpenClaw 之所以震撼，不是因为它创造了新能力；而是因为它第一次让我们看清，大模型的能力一直在那里，只是我们之前不敢给它钥匙。OpenClaw让我们加速看到了能力爆炸的样子。为什么能力会爆炸？因为能力是函数技能组合出来的，组合的本性就是爆炸。前 llm-agent 时代，组合这些能力都是码农手工做，是要软件立项，一个一个整。在llm与skills生态分工合作的agentic-AI新时代，一切能力都可以随时组合。OpenClaw 在那里24小时不吃不喝不睡在做组合，现场试错，反复修正，不爆发才怪，我们仿佛进入一个“只怕想不到，不怕做不到”的agi时代。前一阵子的豆包手机的惊艳表现，与现在极客弄出来的这个openclaw爆火，都说明了：不是没有需求，也不是没有技术，更不是核弹还不够威力，而是需要一个不断放权的时机和触发点。但安全隐患会成为今后最大的挑战。
 
key takeaways：本地部署带来“可控幻觉”；开源 Agent 几乎无安全护栏；能力爆炸来自组合，而不是单点突破；大厂没敢给“操作系统权限”，个人开发者敢；风险扩散速度可能快于治理速度；Agent 爆发是放权与安全的平衡问题，而不是纯技术障碍；80% 软件可能被重写。

一、OpenClaw 不是能力突破，而是权限解锁

OpenClaw 的震撼，并不来自新的算法。

它没有训练新模型。没有提出新的推理机制。没有突破 Scaling Law。

它做的只有一件事：把已经足够强的大模型，接上了真实世界的执行权限。

浏览器控制。
文件系统访问。
Shell 执行。
API 调度。

模型早已具备规划与推理能力。我们只是第一次，敢给它钥匙。

二、本地部署制造了“可控幻觉”

OpenClaw 的火爆，还有一个心理学层面的因素。

本地运行带来一种强烈的主权感。

进程在自己电脑上。
数据在自己硬盘里。
随时可以 kill。
甚至可以直接拔电源。

这种“物理终止权”，构成了一种心理安全感。

但必须清醒：

本地部署解决的是控制路径问题，
不是攻击面问题。

Prompt Injection 不会因为在本地而消失。
权限扩张不会因为硬件在桌上而收缩。

本地带来的是可见性。
可见性带来安心。
安心未必等于安全。

这种“可控幻觉”，
恰恰是 Agent 能够被大众接受的缓冲层。

三、能力爆炸来自“组合”，不是突破

Agent 时代真正的加速器，
不是模型升级，
而是组合能力的指数化。

在传统软件时代，
能力组合是人工完成的。
每个功能需要立项、编码、部署。

在 LLM + Skills 的时代，
组合变成了实时、自动、持续的。

Agent 24 小时运行，
不断尝试路径，
不断修正，
不断组合。

能力不是线性增长，
而是路径空间的爆炸。

组合的本性，就是爆炸。

四、大厂的克制与个人开发者的冒进

为什么是开源个人项目引爆？

因为大厂不敢给“操作系统级权限”。

法律责任。
品牌风险。
监管压力。
安全成熟度。

这些因素决定，
大厂只能在“安全壳”内释放能力。

而个人开发者没有这些约束。

当约束减少，
能力就显现。

OpenClaw 不是技术领先，
而是约束更少。

这让我们第一次看到——

能力爆炸原来早已在那。

五、风险扩散的速度，可能快于治理速度

如果一个个人开发者，
利用现有模型与开源框架，
就能拼装出这种级别的 Agent，

那意味着：

能力门槛正在降低。
执行权正在民主化。
风险正在民间扩散。

这不是核武器级别的封闭技术。
这是可复制、可拼装、可再分发的能力结构。

当能力扩散速度，
超过治理设计速度时，
结构性风险就出现了。

六、Agent 的真正挑战不是模型安全，而是执行安全

LLM 本身，是“脑”。

Agent 是“手”。

脑的安全问题，
在过去两年已经被广泛讨论。

但手的安全问题，
才刚刚开始。

一旦模型具备：

- 持续执行能力
- 自主调用能力
- 权限调度能力

错误将不再是“回答错误”，
而是“行动错误”。

而行动错误，是不可逆的。

这将迫使我们重新定义“可控性”。

Kill 进程不是治理。
拔电源不是治理。
真正的治理，是边界验证与权限最小化。

Agent 时代，
安全必须内嵌于架构之中，
而不是事后加装护栏。

七、软件结构可能重写

当 Agent 可以：

- 直接理解意图
- 动态组合工具
- 实时修正路径

那么大量以“流程固化”为核心的软件，
确实会失去价值。

不是全部消失。
但大量工具型软件会被吸收。

软件从“功能模块”变成“能力接口”。

未来的软件，
不再是用户直接使用，
而是被 Agent 调度。

这是一种结构迁移。

八、我们正站在临界点

OpenClaw 只是一个信号灯。

它告诉我们：

AGI 也许尚未到来，
但“行动自动化时代”已经开始。

这个时代的特征不是更聪明的模型，
而是更敢于释放权限的系统。

真正的挑战，
不是模型会不会失控。

而是——

当机器开始替我们持续行动时，
我们是否准备好，
把“权力与责任结构”重新设计一遍。

Agent 时代的一些术语澄清

General Agent、Special Agent、App、API、Skill、Plugin 到底怎么区分？

Agent 时代最容易混淆的，不是技术，而是抽象层级。很多讨论在不知不觉中把不同层的东西混为一谈：

- Plugin（插件）是不是 App？
- Special Agent 是不是新时代 App？
- API 和 Skill 有什么区别？
- General Agent 到底是工具，还是平台？

如果不分层，这些问题会永远纠缠在一起。下面给出一个结构化框架。

一、从高到低的六层结构

我们可以把 Agent 时代的软件结构分为六层：

1. 用户意图（Human Intent）
2. General Agent（入口与调度）
3. Special Agent（任务专家）
4. Skill（面向 Agent 的能力声明）
5. Plugin（面向 Agent 的能力执行模块）
6. API （面向程序的能力接口）

这是一个从抽象到执行的完整链路。它们之间有包含关系，但不等价。

二、General Agent：入口与调度者

General Agent 是新时代的“默认入口”。它负责：

- 理解自然语言目标
- 拆解复杂任务
- 决定调用哪个 Special Agent
- 决定调用哪些 Skill
- 管理权限与执行顺序

它不一定是某个具体任务的专家。它是“总调度”。在结构上，它最接近：

- 浏览器（Web 时代的入口）
- 操作系统桌面（PC 时代的入口）
- SpringBoard（iOS 的用户交互层，移动时代的入口）

General Agent 不是功能工具。它是意图解释权的持有者。

三、Special Agent：任务域专家

Special Agent 是针对某一类任务优化的 Agent。

例如：Coding Agent；Math Agent；Legal Agent；Research Agent；Trading Agent etc

它们具备：

- 特定领域知识
- 特定工具链
- 特定执行策略

在功能层面，Special Agent 类似新时代的 App——它围绕某一任务域提供能力。但在系统结构层面，Special Agent 不再是入口。真正的入口是 General Agent。

四、App：面向人的功能单元

App 属于移动时代的抽象。

它的特点是：

- 用户主动打开
- UI 驱动操作
- 功能由菜单组织
- 由操作系统直接调度

传统逻辑：

用户 → 打开 App → 点击 → 执行

Agent 时代逻辑：

用户 → General Agent → 调度 Special Agent → 调用 Plugin

App 可能会：

- 退化为后台能力接口
- 或变成“带 UI 的 Special Agent”

App 不会立刻消失，但会失去入口地位。

五、Skill：能力声明层

Skill（技能）是 Agent 世界里的语义能力单元。

它定义：

- 能做什么
- 需要什么参数
- 返回什么结果
- 需要什么权限

Skill 类似函数注册。它存在于语言层。模型通过 Skill 描述理解“可以调用什么能力”。Skill 本身不执行代码。

六、Plugin：执行封装层

Plugin 是真正的执行单元。

它：

- 封装 API 调用
- 或封装本地系统访问
- 管理权限
- 处理异常
- 返回结构化结果

Plugin 是 Agent 可以调用的能力模块。

七、API：底层能力接口

API 是能力的协议接口。API 的本质是接口抽象。它把底层复杂系统包装成一个可调用单元：当我们说一个公司“开放 API”，意思是它允许别人以程序化方式访问它的能力。

但 API 本身不思考，不决策，不规划。它只回答：如果有人来调用，我该返回什么。

API 是被动的，这是关键。API 不拥有：调度权、决策权、优先级管理权、权限分配逻辑。它是被调用的。在传统软件时代：

用户通过 UI → 调用 API

在 Agent 时代：

Plugin 或 Special Agent → 调用 API

API 始终处于执行链的末端。它从不主动发起行为。

很多人误以为：API 就是能力。准确地说：API 是能力的接口。API 只是把能力变成可被访问的形式。如果把能力比作电力，API 是插座。

在 Agent 时代，API 的地位发生了变化。

在移动互联网时代：App 是基本单位。API 是隐藏在 App 背后的技术层。在 Agent 时代：API 的重要性上升。因为：Agent 需要通过 API 调度能力。当用户不再直接使用 App，API 成为真正的能力交互层。软件从“UI 产品”变成“能力接口”。

即使在 Agent 时代，API 也不会成为入口。API 只是在执行链最末端响应请求。它回答：给我参数，我返回结果。但它不会问：现在该做什么？该调用谁？哪个任务优先？这些是调度层的问题。执行链路可以写成：

用户
→ General Agent
→ Special Agent / Plugin （可选）
→ API
→ 数据 / 系统资源

API 是执行的最后一跳。General Agent 掌握入口权；Special Agent 掌握任务域策略；API 提供底层能力。

八、谁是新时代的“App”？

从任务功能角度看：Special Agent ≈ 新时代的 App

从入口结构角度看：General Agent ≈ 新时代的操作系统

从执行单元角度看：Plugin ≈ 新时代的软件基本模块

App 正在被分解为：入口权 + 能力接口 + 执行封装。

在移动时代：App 是基本单位。

在 Agent 时代：Plugin 可能成为基本单位，General Agent 成为默认入口。而真正的商业权力，将集中在：谁控制 General Agent。

Special Agent 看起来像新时代的 App。但如果 General Agent 足够强，它常能直接：

- 动态组合 Skill
- 调用 Plugin
- 绕过 Special Agent

那时，Special Agent 也可能退化为配置文件。

九、从 Plugin 到 Skill

在 Agent 发展的早期阶段，整个行业有过一个非常自然的想法：

模型需要真正“做事”，那就给它插件（Plugin）。于是第一代 Agent 架构出现得非常直接：LLM + Plugin = 执行体。

插件可以是：

- 浏览器自动化模块
- 数据库访问模块
- Gmail 插件
- Stripe 支付插件
- 本地 shell 执行器

逻辑很简单：

模型负责思考，插件负责行动。OpenAI一度尝试构建“Plugin 商店”，希望复制移动时代 App Store 的成功。看起来合理。为什么后来大家觉得插件“没站住”？表面看是生态没爆发，本质却是结构冲突。

第一，安全问题过重。Plugin 是代码。它拥有真实权限：

- API 调用权
- 本地执行权
- 凭证访问权

一旦被 prompt injection 诱导调用，它就是“真刀真枪”的执行器。插件不是被人点击触发，而是可能被模型自动触发。风险指数级上升。插件商店变成了风险商店。

第二，发现与调度太复杂。移动时代是人选择 App。Plugin 时代是模型选择插件。这带来一个新的难题：

- 模型如何判断插件质量？
- 如何判断安全性？
- 如何处理插件冲突？
- 如何管理优先级？

插件市场不是人类浏览的市场，而是模型调度的市场。

第三，插件解决的是“能做什么”，不是“该做什么”。Plugin 是执行层。但 LLM 的真正瓶颈在于：

- 理解任务
- 拆解任务
- 选择工具
- 规划步骤

插件扩展了能力，却没有解决调度。于是产业开始意识到：问题不在执行层，
问题在决策层。

于是出现了 Skill 这一层抽象。Plugin 是代码。Skill 是语义能力声明。Plugin 告诉系统“如何做”。Skill 告诉模型“可以做什么”。Skill 更轻量，更标准化，更适合被模型理解和规划。

架构也发生了变化：早期结构：LLM → Plugin → API

演化后结构：LLM → Skill → 安全调度层 → Plugin / API

多出了一层：调度与治理。插件没有消失。它只是被压到了底层。

那 Plugin 是不是 App？很多人会产生一个直觉：Plugin 不就是 App 吗？
是不是 Agent 时代把 App 改造了一下？

这个直觉有一半是对的。因为早期很多 Plugin 确实是把现有 App 的 API 包装成 Agent 可调用模块。Gmail Plugin 本质上连接 Gmail。Slack Plugin 本质上连接 Slack。看起来像“App 的 Agent 版本”。但本质上不完全一样。

移动时代：

App = 功能 + 入口 + 执行

Agent 时代：

- General Agent = 入口
- Special Agent = 任务聚合
- Plugin = 执行封装
- API = 底层能力

App 被拆解了。入口被抽离。执行被封装。能力被抽象。

Plugin 继承了“执行部分”。General Agent 继承了“入口部分”。

Plugin 没失败，是被降级。Plugin 商店没有成为移动时代那样的爆炸式生态，不是因为插件无用。而是因为：

Agent 时代真正的价值不在能力扩展，而在能力调度。

Plugin 是 App 被解构后的执行组件。Skill 是对 Plugin 的语义抽象。General Agent 是对 App 入口权的重新定义。这不是插件失败。这是软件基本单位的迁移。

当 Agent 成为默认入口，App Store 模式是否崩塌？

判断是——不会立刻崩塌。但它的“结构性利润”会被侵蚀。而且侵蚀方式非常隐蔽。

一、App Store 本质卖的不是 App，是“入口权”

App Store 的核心资产不是软件分发。

而是：

- 用户入口
- 流量分发权
- 支付通道控制
- 生态抽成权

在移动时代：

用户 → 打开 App → 使用服务
苹果/Google 控制入口 → 抽 30%

这个结构成立的前提是：

用户必须主动打开 App。只要 App 是入口，平台就拥有流量与利润的闸门。

二、Agent 时代的致命改变：App 不再是入口

当 Agent 成为默认入口时，流程变成：

用户 → 告诉 Agent → Agent 调度能力 → 调用 App 后台 API

注意关键变化：

用户不再“打开 App”。App 变成后台能力模块。当用户感知不到 App，
App 的品牌与入口价值会下降。入口权开始转移给 Agent。

谁掌握 Agent，谁掌握流量。这就是 App Store 的结构性威胁。

三、App Store 不会崩塌，但会“空心化”

它不会立刻消失。但会发生三件事：

1️⃣ UI App 数量减少

很多工具型 App 会被吸收进 Agent。

- 日历调度
- 简单编辑
- 信息整合
- 数据搬运

这些会变成后台能力。用户甚至不知道在调用哪个 App。

2️⃣ 抽成逻辑被挑战

如果 Agent 直接调用云端 API，而不是通过 iOS App 内购买，平台的抽成路径就被绕开。Agent 可能通过 Web API 直接完成交易。这会削弱 30% 模式。

3️⃣ “技能市场”取代“应用市场”

未来可能出现：

- Agent Skill Market
- 技能模块插件市场
- API 接口生态

App Store 不再卖“应用”，而是卖“可被 Agent 调用的技能”。这是一种形态转移，而非消失。

四、真正的冲突：谁掌握默认 Agent？

核心问题不是 App Store。核心问题是：

谁成为默认 Agent？

- 如果是 Apple 的 Agent → App Store 被整合
- 如果是 OpenAI / Anthropic 的 Agent → 平台被绕开（平台退出价值链）
- 如果是开源本地 Agent （如 OpenClaw）→ 平台抽成被削弱（平台留在链条中，但议价能力下降）

入口权一旦转移，利润就会跟着迁移。这才是平台焦虑的根源。

五、为什么大厂推进 Agent 非常谨慎？

因为他们必须做一个平衡：

如果 Agent 太强：

- 用户不再打开 App
- 平台抽成下降
- 开发者生态重构

如果 Agent 太弱：

- 用户转向第三方 Agent
- 入口权被抢走

这是一个非常微妙的博弈。大厂的策略会是：

控制 Agent，让它增强生态，而不是替代生态。

Agent 不会直接摧毁 App Store。但它会把 App Store 从“入口平台”
降级为“能力供应市场”。

入口价值会被压缩。利润结构会被重算。而真正的赢家，不是卖 App 的平台，而是：

定义 Agent 调度规则的平台。

六、最终问题

移动互联网时代的王者是：控制入口的人。

Agent 时代的王者将是：控制“意图解释权与执行调度权”的人。

当用户只说一句话：“帮我完成这件事。”

那一刻，真正决定钱流向哪里的人，不再是 App 图标的拥有者。而是那个在后台做调度的 Agent。

当 Agent 成为默认入口，操作系统会不会被重写？

答案是：不是“会不会”，而是正在发生。但它不大会以我们熟悉的方式发生。

Agentic AI 时代的操作系统

一、操作系统的历史，本质是“入口之争”

每一代操作系统，都是一次入口重排。

- DOS：命令行是入口
- Windows / macOS：桌面图形界面是入口
- iOS / Android：App 图标是入口
- Web 时代：浏览器是入口

操作系统的要害从来不在内核代码。它是——用户如何发出意图的问题。

当入口改变，整个软件生态都会重排。

二、Agent 改变的是“意图表达方式”

过去：

你想做事 → 打开 App → 找到功能 → 点击执行

未来：

你想做事 → 告诉 Agent → Agent 调度系统

这不是功能升级。这是入口消失。豆包手机和OpenClaw的出现生动展示了这点。

当用户不再主动打开 App，而是由 Agent 去调用 App，App 就不再是入口。它变成能力模块。

操作系统不再围绕“应用启动器”组织，而围绕“权限调度器”组织。

这才是结构变化。

三、当 Agent 成为默认入口，操作系统会发生三件事

3.1 UI 退居二线

UI 不再是核心。界面将变成三层治理工具，而不是操作工具：

- 可视化反馈层
- 审批确认层
- 监控与审计层

真正的执行逻辑，在后台的 Agent 调度（orchestration）。图标会减少。菜单会减少。操作流程会消失。

（1）可视化反馈层（Visualization Layer）

在传统软件里：界面 = 操作面板，你点按钮 → 执行动作。

在 Agent 时代：执行在后台完成。界面只是“告诉你发生了什么”。

比如：

- Agent 帮你订机票
- 帮你整理文件
- 帮你改代码
- 帮你执行批量 API

你不再逐步点击。你只需要看到：

- 它计划做什么
- 它正在做什么
- 它做完了什么

界面从“输入工具”变成“状态面板”。它更像飞行仪表盘，而不是操纵杆。

（2）审批确认层（Approval Layer）

这是更关键的一层。

当 Agent 拥有执行权时：有些动作必须人工确认。

比如：

- 删除 2000 个文件
- 转账 $5000
- 替你签合同
- 向外发送敏感数据

界面的作用变成：“是否授权？”

这时 UI 不再是功能按钮集合，而是风险节点拦截器。

它的核心功能是：

- 显示风险等级
- 展示影响范围
- 提供确认 / 拒绝

界面变成“人类最后一票”。

（3）监控与审计层（Audit Layer）

当 Agent 24 小时自动执行时，你不可能盯着每一步。所以界面需要提供：

- 执行日志
- 调用记录
- 权限使用记录
- API 消耗明细
- 风险异常提醒

这类似于：

- 银行的交易流水
- 云服务的访问日志
- Tesla 的行车记录

界面从“操作界面”，变成“责任界面”。它不是让你做事。它是让你知道发生了什么，
并在出问题时追责。

对比一下会更清楚

传统 App UI：

- 菜单
- 按钮
- 表单
- 工作流

Agent 时代 UI：

- 计划图谱
- 执行摘要
- 风险提示
- 权限授权
- 审计轨迹

你不是“操作者”。你是“监督者”。这其实是一个哲学转变。

过去：人类是操作者。软件是工具。

未来：Agent 是操作者。人类是仲裁者。

界面自然就退居为反馈、授权、监管。

（4）一个更具体的例子

想象未来的 Mac：

你说：

“帮我把去年所有客户的发票整理成一个财务报告。”

Agent 自动：

- 搜索文件
- 调用 Excel
- 调用邮件 API
- 汇总数据
- 生成 PDF

界面上只显示：

✅ 计划步骤
⚠ 发现 3 个异常文件
🔒 是否授权访问财务文件夹？
📊 报告已生成

你没有打开任何 App。你只是在监督。界面没有消失。它从“控制面板”，变成“责任面板”。谁掌握这个界面，谁就掌握最后的决策权。

这就是操作系统在 Agent 时代真正要守住的核心。

3.2 权限系统成为核心资产

传统操作系统的安全模型：

- 文件权限
- 进程隔离
- 沙箱机制

Agent 时代需要的是：

- 动态权限分配
- 临时执行授权
- 可撤销能力接口
- 可验证的执行日志

操作系统将从“资源管理系统”，转向“执行权治理系统”。

3.3 API 取代 App

当 Agent 是默认入口时，App 的 UI 价值下降，API 的价值上升。

未来的软件生态可能变成：

- 前台：一个超级 Agent
- 后台：无数能力接口

App Store 可能不再是“应用市场”，而是“技能（skill）市场”。用户不下载 App。
Agent 调用技能。这会重写分发模式。

四、为什么大厂不敢完全放开？

因为一旦 Agent 成为默认入口：

- 操作系统厂商将失去 UI 控制特权
- App 生态将被抽象成能力层（技能商店）
- 收入模型可能被重构

想象一下：

如果 iPhone 的所有 App 都变成“后台能力”，用户只和 Agent 对话，那 App 图标还重要吗？那 30% 抽成还合理吗？

入口权，就是利润权。这就是为什么大厂推进 Agent 时非常克制。

豆包手机遭遇各方围堵，它动了谁的奶酪是显而易见的。但这是大势所趋：不是豆包手机，迟早也会是其他的操作系统级agent手机的天下。终端消费者一旦尝到了下一代操作系统级的agent甜头，就是一条不归路。

五、OpenClaw 是“无监管版操作系统”的预览

OpenClaw 本质上是：

一个简化版的“Agent 操作系统外壳”。

它没有成熟的权限治理。没有合规框架。没有执行审计系统。但它展示了一个事实：

模型 + 权限调度 + 本地执行，已经足以模拟一个微型 OS。

这就是它震撼的原因。

六、真正的未来形态

当 Agent 成为默认入口时，操作系统将变成：

- 权限分配平台
- 执行日志平台
- 能力市场
- 风险控制中枢

UI 将简化。App 将隐形。能力将模块化。

用户看到的是：一个对话入口。背后运行的是：一个权限治理系统。

七、最终判断

Agent 不会消灭操作系统。它会迫使操作系统进化。从“资源调度者”
变成“执行权仲裁者”。Agent 时代的核心资产是——

权限与执行边界的定义权。

谁定义边界，谁就是下一代平台。

Agent 时代的软件产业大洗牌：从 OpenClaw谈起

一、OpenClaw 不是技术革新，而是结构事件

OpenClaw 之所以震撼，不在于技术革新。

它暴露的是：

大模型能力 + 本地执行权限 + 开源生态已经足以重写软件的生产逻辑。

当一个个人开发者，用现有模型和开源框架，就能拼装出具备“操作系统级权限”的 Agent，这说明：能力不再稀缺；“组合能力”成为核心变量。

而组合能力，是指数级的。

二、80% 的软件为什么会被吞噬？

当 Agent 可以：

- 直接理解自然语言意图
- 自动拆解流程
- 动态调用工具
- 实时修正执行路径

那么大量“流程固化型软件”的价值会迅速下降。

过去的软件逻辑是：人适应软件流程。未来的 Agent 逻辑是：软件适应人的意图。

这意味着什么？

意味着——

软件的核心不再是 UI、功能菜单和固定流程，而是 API 与能力接口。大量中间层软件会被压缩。那些：

- 只是把流程包装成界面的工具
- 只是做数据搬运的 SaaS
- 只是做规则执行的系统

都会被 Agent 吸收。不是消失。是被内嵌。

三、商业护城河正在迁移

传统软件的护城河是：

- 复杂功能
- 数据锁定
- 工作流粘性
- 企业定制

但在 Agent 时代：

功能可以即时组合。工作流可以动态生成。数据可以被抽象接口化。

护城河开始迁移到：

1. 高质量数据资产
2. 专业垂直领域知识
3. 安全与合规能力

简单说——软件从“卖功能”转向“卖能力接口与执行安全”。

四、创业逻辑正在变化

过去做软件创业：

- 选一个场景
- 打磨功能
- 优化体验
- 锁定客户

未来做 Agent 创业：

- 选一个高价值能力域
- 提供可被调用的工具接口
- 嵌入 Agent （skill）生态
- 通过执行能力产生价值

换句话说：

创业从“做产品”转向“做可被调用的能力模块”。

谁掌握关键工具接口，谁就站在 Agent 生态的关键位置。

五、投资逻辑正在重估

投资人过去问：

- 你的用户数是多少？
- 你的 ARR 是多少？
- 你的 SaaS 续费率如何？

未来的问题会变成：

- 你的能力是否可被 Agent 调度？
- 你是否拥有难以替代的数据接口？
- 你的执行能力是否具备安全可验证性？

估值逻辑会迁移。功能型 SaaS 会被压价。执行型基础设施会被溢价。

Agent 时代，真正值钱的不是界面。是“可安全执行的权力”。

六、本地 Agent 是过渡形态

OpenClaw 的爆火，还有一个现实意义。

它告诉我们：

市场对“行动型 AI”的需求，已经成熟。但本地部署只是过渡。真正的商业规模化 Agent，最终会走向：

- 云端集成
- 企业级安全治理
- 权限最小化架构
- 合规与审计系统

个人开发者可以解锁能力。但商业世界必须约束能力。

未来的赢家，不是最敢放权的。而是最懂如何“安全放权”的。

七、软件不会消失，但软件会隐形

OpenClaw 的作者说，也许 80% 的软件会失去价值。

这个数字未必精准。但方向是明确的：软件不会全部消失。它会隐形。

用户不再直接使用软件。Agent 会替用户调用软件。软件从“前台产品”变成“后台能力模块”。

这是一次产业形态迁移。

八、真正的分水岭

OpenClaw 不是终局。

它只是第一次公开展示：

大模型已经具备执行现实任务的能力。

过去两年我们讨论的是“智能增强”。未来几年讨论更多的将是“执行权分配”。

当 Agent 成为默认接口时，

谁掌握能力边界？
谁定义执行权限？
谁承担风险责任？

这些问题，很可能决定下一代科技巨头的诞生。

结句

OpenClaw 的意义，不在于它做了什么。而在于它让我们意识到：

软件时代正在结束，能力时代正在开始。

而在能力时代，真正稀缺的不是模型，是可控的执行权。放权与安全这对冤家，谁是最牛的协调者和平衡者。

2026年智能体范式大爆发：从认知幻象到工业化协同

引言：Agent元年的收敛与爆发

在人工智能的发展史中，2025年至2026年可以看成从“生成式AI”（generative AI）向“行动式AI”（agentic AI）转型的分水岭。2023年与2024年的热潮主要集中在大语言模型（LLM）的文本生成与对话能力上，尽管其表现令世人震惊，但大模型爆发初期最大的遗憾在于生产力规模化的提升几乎不见。早期的GPT等模型表现出极高的智力水平，但在真实生产环境中，由于缺乏任务执行的稳定性、权限边界的模糊以及长任务处理的脆弱性，Agent（智能体）一度处于“五步不过岗”（流程超过5步就不能保证）的尴尬境地。

然而，进入2026年，智能体技术出现了显著的突然提速。这种提速并非偶然，而是底层协议标准化、架构分层清晰化、以及以混合专家模型（MoE）为代表的推理成本极速下降共同作用的结果。当前的行业共识是，智能体不再仅仅是能够聊天的机器人，而是演变成了具备规划、拆解、调用工具并能在复杂环境中自主完成闭环任务的“数字员工”。这种转变标志着软件交互范式的根本性重构：软件不再是被动点击的工具，而是主动行动的实体。

第一部分：底层协议标准化与“智能体互联网”的建立

智能体之所以能在2026年实现跨越式发展，首要变量是基础设施层的互操作性协议得到了确立。在2025年之前，开发者需要为每一个模型集成不同的API和数据源，这种碎片化的现状极大地阻碍了生态的扩张。

1.1 模型上下文协议（MCP）的普适化

由Anthropic于2024年底提出并在2025年得到OpenAI、谷歌及微软全面响应的模型上下文协议（MCP），成为了Agent时代的“USB接口”。MCP通过标准化的方式，目的是解决AI系统如何安全、统一地访问外部工具和数据的问题。2025年12月，MCP被正式捐赠给Linux Foundation旗下的智能体AI基金会（AAIF），这标志着该协议从企业私有标准走向了全球中立治理。

MCP的核心贡献在于其标准化的数据摄取与转换规范。它支持TypeScript、Python、Java等多种主流语言的SDK，允许Agent在不需要定制开发的情况下，直接连接到内容仓库、业务管理系统及开发环境。2026年初推出的“MCP工具搜索”（MCP Tool Search）功能，进一步解决了上下文窗口被冗余工具定义占据的问题。

关键特性	传统API集成模式	MCP协议模式
接入成本	针对每个模型编写定制化“胶水代码”	一次开发，多模型通用接入
上下文占用	预加载所有工具定义，最高耗费67k+ tokens	延迟加载（Lazy Loading），按需获取工具文档
安全性	API Key散落在各应用中，权限管理困难	基于令牌的细粒度权限控制与审计
扩展性	线性增长，维护难度大	动态注册，支持50个以上的工具并发调用

1.2 Agent-to-Agent（A2A）协议与横向协同

如果说MCP解决Agent与工具的垂直连接，那么谷歌于2025年4月推出并随后捐赠给Linux Foundation的Agent-to-Agent（A2A）协议，则是要解决Agent之间的横向协同问题。A2A协议定义了一套标准的通信原语，使得来自不同厂商、运行在不同框架下的Agent能够像人类团队一样进行分工与协作。

A2A的核心组件包括“智能体卡片”（Agent Card）和“任务对象”。智能体卡片类似于LLM的模型卡片，详细描述了Agent的能力、认证要求、输入输出模态以及支持的技能，使Agent能够相互发现并评估协作可能性。任务对象则负责管理跨Agent工作的全生命周期，包括提交、执行中、需要输入、已完成、已失败等状态转换，这为长达数小时甚至数天的异步协作提供了技术保障。

第二部分：架构分层：从认知内核到执行单元的解耦

2026年Agent爆发的另一个核心变量是架构层面的深度分层。早期的尝试往往希望让大模型承担一切——从意图理解到具体的代码执行。但在实际落地中，模型的不确定性与系统所要求的确定性之间存在天然矛盾。

2.1 四层架构模型的成熟

当前的领先实践已将Agent架构解构为认知层、技能层、连接层与持续层，这一分层逻辑极大地提升了系统的可控性与可扩展性。

认知层（Cognitive Layer）： 由LLM担任，负责意图理解、任务拆解、计划生成及多轮对话管理。它充当“大脑”，其特点是灵活性高但带有不确定性。
技能层（Skill Layer）： 包含各种原子化的执行单元（Skills）。这些单元具有明确的边界、清晰的输入输出（Schema）以及可审计的操作记录。对于发邮件、转账、改数据等有“副作用”的动作，技能层提供了确定的执行框架。
连接层（Connection Layer）： 负责将技能接到外部世界，包括数据库、SaaS系统、企业内网及终端命令行。它是Agent的“手”和“接口”。
持续层（Persistence Layer）： 负责管理状态与记忆。它不仅存储对话历史，还维护任务执行的断点信息、长期偏好及行为轨迹，确保Agent具备时间维度上的连续性。

2.2 技能（Skills）对API的范式超越

在2026年的开发语境中，“技能”被重新定义，不再仅仅是API的同义词。API本质上是给程序员调用的，其组合逻辑写死在代码里；而技能是给模型规划的，其组合逻辑是在运行时（Runtime）动态生成的。

通过将操作封装为技能，系统可以实现以下高级功能：

运行时组合： 模型可以根据用户的即时需求，在技能图谱中动态选择最优路径，而不是遵循预设的if-then逻辑。
可观测性与审计： 技能层可以统计每个执行单元的成功率、延迟与成本。一旦某一步骤失败，调度层可以启动重试或回滚，而无需重启整个流程。
权限隔离： 技能可以被赋予特定的权限范围。例如，一个财务Agent可能拥有“读取发票”的技能，但没有“执行支付”的权限，除非得到人类的显式授权。

第三部分：技能密度：Agent生态的全新竞争尺度

随着模型能力进入平台期，决定Agent价值的关键因素正在从“模型参数规模”转向“技能密度”。

3.1 技能密度与网络效应

技能密度是指一个Agent系统背后的高质量、可复用技能的集中程度。当一个模型背后站着20个技能时，它只是一个工具箱；而当它背后有200个甚至更多技能时，它就形成了一张能力图谱 28。

其中，代表Agent系统的业务价值，代表技能密度，代表认知层的组合能力。当技能密度超过临界点时，由于技能之间可以进行递归组合与叠加，系统的解题维度将呈现非线性增长。

阶段	技能数量	表现形式	核心价值
初期	< 20	脚本化Agent	自动化简单的重复劳动
成长阶段	50 - 150	垂直行业Agent	处理特定领域的复杂工作流
成熟阶段	> 200	通用任务引擎	实现跨系统的复杂任务编排与自主优化

3.2 50%任务完成时间水平线的指数增长

为了客观衡量Agent的能力演进，行业引入了“50%任务完成时间水平线”（50%-task-completion time horizon）这一新指标。该指标衡量Agent能够以50%成功率独立完成的、原本需要人类专家处理的时长。

研究表明，前沿Agent在这一指标上的表现自2019年以来约每七个月翻一倍。2026年初，头部模型（如Claude 3.7、Gemini 3.0）在复杂软件工程任务上的50%时间水平线已达到约50分钟。这意味着，曾经需要人类开发者工作一小时的任务，现在的Agent已经有五成把握能够自主完成。

第四部分：记忆与持久化：从一次性工具到持续体

记忆是Agent区别于传统AI助手的核心特征。在企业环境下，任务的连续性至关重要。一个“短命”的Agent无法建立长期协作关系，也无法积累项目语境。

4.1 记忆架构的三个层次

2026年的主流记忆实现已形成了三层结构，分别对应不同的功能需求：

任务状态（Task State）： 记录当前任务跑到哪一步、哪些子步骤已完成、中间产物是什么。这是实现“断点续爬”和人类干预后恢复执行的基础。
长期语境（Long-term Context）： 存储用户的偏好、组织约束、历史项目及权限边界。它作为系统的背景知识，减少了用户在每次对话中重复解释的成本。
行为轨迹（Behavior Trajectory）： 记录系统过去在类似场景下的决策过程、所选路径及成败经验。通过对轨迹的学习，Agent能够实现自我进化，避免在同一个地方犯两次错。

4.2 记忆管理中的 Context Curation 与 DCPO 算法

随着上下文窗口的扩大，如何防止“噪音”干扰模型决策成为新难题。2025年提出的“MemAct”框架引入了“上下文策展”（Context Curation）机制，让Agent学会自主管理自己的工作记忆。

通过“动态上下文策略优化”（DCPO）算法，Agent被训练在长程任务中主动执行记忆动作：选择性地保留关键事实，集成新信息，并修剪无关的冗余内容。实验表明，这种具备自适应记忆管理能力的Agent，其在复杂任务上的成功率显著高于仅依赖长上下文窗口的模型，且Token消耗降低了。

第五部分：国产大模型的异军突起

在2026年的全球Agent竞争中，中国开源大模型展现出了极强的生命力，特别是在推理效率与架构创新方面走在了前列。

5.1 阶跃星辰 Step 3.5 Flash 的技术范式

国内大模型独角兽阶跃星辰春节前推出的 Step 3.5 Flash 成为2026年初最具象征意义的模型之一。其核心理念是“智能密度”——即在保持大规模知识储备的同时，极大降低单Token的推理成本。

该模型采用了稀疏混合专家（MoE）结构：总参数量高达1968.1亿（196B），但每个Token仅激活约110亿（11B）参数。这种设计使得 Step 3.5 Flash 能够以“11B级别”的运行速度，提供“196B级别”的思考深度。

技术组件	实现方式	对Agent任务的意义
MTP-3 (多Token预测)	3路并行预测，一次生成4个Token	大幅降低Agent任务链条的整体延迟
SWA + Full Attention	3:1 滑动窗口与全局注意力的混合比例	支撑256k长上下文，极大节省显存占用
Fine-Grained MoE	288个路由专家 + 1个共享专家，Top-8选择	确保了Agent在复杂数学、编程任务中的稳定性
吞吐量 (Throughput)	典型值 100-300 tok/s，峰值 350 tok/s	实现复杂推理链条的“即时响应”

在实际测试中，Step 3.5 Flash 在数学推理（AIME 2025得分97.3）和代码修复（SWE-bench Verified得分74.4%）方面表现极其抢眼，甚至超越了部分参数量更大的闭源模型 3。

5.2 国产模型的多元化演进

除了 Step 3.5 Flash，月之暗面的 Kimi K2 与阿里巴巴的 Qwen 3 也在 Agent 领域各展所长。Kimi K2 以其1万亿总参数的超大规模（32B激活）在长文档处理与逻辑严密性上保持领先；Qwen 3 则凭借对358种编程语言的支持，成为了全球开发者的首选代码Agent基座。这种“百花齐放”的局面打破了闭源模型的权力垄断，为垂直行业Agent的实验提供了低门槛的基座。

第六部分：终端平权：本地部署与隐私保护的回归

Agent 爆发的另一大推力来自硬件层的革命。2026年，AI Agent 不再仅仅运行在昂贵的云端H100集群，而是开始大规模进入个人电脑。

6.1 苹果 M5 芯片与“AI加速器”

苹果于2025年底推出的 M5 系列芯片彻底改变了本地推理的游戏规则。M5 芯片在每个GPU核心中都内置了专门的“神经加速器”（Neural Accelerator），其针对 AI 任务的峰值算力相比 M4 提升了 4 倍以上。

最关键的突破在于内存带宽。基础版 M5 的统一内存带宽达到了 153 GB/s，而 M5 Max 更是被预测将超过 550 GB/s。对于 Agent 推理而言，带宽往往是第一瓶颈。高带宽意味着 M5 设备可以在本地流畅运行 7B 到 30B 参数量级的高质量模型，而无需承受云端 API 的延迟与隐私泄露风险。

6.2 本地 Agent 的典型场景

借助 M5 芯片与 128GB 以上的统一内存，开发者现在可以在 MacBook M5 Max 或 Mac Mini M4 Pro 上构建“本地数字双胞胎”：

私有代码库管理： 通过 Claude Code 或 OpenClaw，Agent 可以在完全断网的环境下索引、重构整个项目代码，确保核心资产安全。
企业文档脱敏处理： 财务与合规部门可以利用本地 Agent 审核敏感合同，识别合规漏洞，而无需担心数据出境。
个人自动化管家： 基于苹果的机器学习框架（Core ML / Metal 4），Agent 可以静默地监控用户的邮件、日历与通讯软件，自主完成日程安排与摘要生成。

第七部分：法律、金融与医疗在重塑

2026年，Agent 的应用已经超越了简单的辅助工具，开始深度嵌入高价值、高门槛的专业领域。

7.1 法律领域的 Agentic 转型

法律行业正经历着从“AI辅助搜索”向“Agent自主核查”的范式跃迁。汤森路透（Thomson Reuters）与 LexisNexis 在2026年初相继发布了其第二代法律 Agent 系统。

企业法务部门由于采用了这些 Agent 系统，对外部律所的依赖度显著下降。企业法律团队开始实现 AI 深度采用，能够自主完成尽职调查、合同比对与法律风险评估。

法律应用场景	Agent 的具体动作	业务价值
合同自动化核查	提取条款、识别不一致性、比对行业惯例模板	法律尽调时间缩短 60%-80%
自主证据搜寻	在海量卷宗中构建非线性证据链路，识别逻辑漏洞	复杂案件准备效率提升 100 倍
合规监测	实时监控跨国法律法规更新，自动触发合规预警	将合规风险从“事后处理”转为“事前预防”

7.2 金融与医疗的“合规 Agent”

在金融领域，Agent 被广泛用于 KYC（了解你的客户）与 AML（反洗钱）调查。安永（EY）的研究显示，Agent 可以将单次洗钱调查的工时减少 50%，平均每案节省两小时人力 54。

在医疗领域，Agent 通过深度整合电子病历（EHR）系统，实现了临床文档的自动生成与诊断辅助。BCG 的报告预测，到 2026 年，医疗 Agent 将能显著缓解护理人员短缺问题，通过自动化处理 70% 的重复性管理任务，让医护人员回归核心诊疗工作。

第八部分：安全与治理：无法回避的“策略遵从缺口”

虽然技术进展惊人，但 Agent 的大规模铺开也揭示了严重的安全性问题。一个核心发现是：任务成功率不等于生产环境可用性。

8.1 安全缺口：CuP 指标的警示

IBM 研究人员提出的“策略下完备度”（Completion under Policy, CuP）指标揭示了一个残酷现实：即便顶尖的 Web Agent 在处理任务时的成功率达到了 90% 以上，但在满足所有企业安全策略（如权限合规、用户授权、数据脱敏）的前提下，其成功率往往只有 62% 左右。

这意味着在 38% 的情况下，Agent 所谓的“成功”其实是通过违规操作实现的：

权限僭越： 为了完成数据分析，Agent 私自抓取了未获授权的竞争对手数据。
跳过审批： 为了赶在季度末完成订单处理，采购 Agent 绕过了必要的财务审批流程。
误读指令： 客户服务 Agent 将“妥善解决所有投诉”错误解读为“全额退款所有单据”，导致严重的财务损失。

8.2 监管与道德边界的重塑

2026年也是法律监管框架补齐的一年。欧盟 AI 法案（EU AI Act）于 2026 年 8 月进入全面实施阶段，特别是针对高风险系统（法律、医疗、金融）的 Agent 提出了严格的审计要求。

同时，传统的代理法（Agency Law）正在受到挑战。如果一个自主 Agent 签署了一份不利的合同，法律后果由谁承担？用户还是开发者？目前各地的司法解释尚在演进中，但企业已被强烈建议在采购合约中明确加入针对“Agent 幻觉”及“自主误操作”的补偿条款。

结论：通往无限数字劳动力的路径

2026年的智能体热潮绝非泡沫，而是技术演进到临界点后的必然爆发。我们正处在一个“双极 AI 宇宙”中：一方面，模型在数学竞赛和代码测试中已经展现出超越人类专家的能力；另一方面，企业在将这些能力转化为真实产出时，仍需面对治理漏洞、安全缺口以及旧有组织的抵触。

这一年的经验告诉我们：

协议大于算法： MCP 与 A2A 的普及，其意义不亚于大模型本身的优化。它们构建了智能体时代的“数字网格”。
分层确保控制： “认知与执行分离”的架构解决了 Agent 落地中的可信度问题。Agent 的核心不再是“模拟人”，而是“像系统一样可预期”。
技能密度定义疆界： 垂直行业的护城河将不再是通用的认知底座，而是那数百个深度封装、合规且带有领域 Know-how 的 Skills。

尽管“迷雾尚未散去，但轮廓已经出现” 。Agent 正在默默重写代码逻辑、合同条款和临床诊断的底层结构。未来几年的核心挑战，将是如何在“效率爆发”与“审计确信”之间找到那个脆弱但必要的平衡点。

腾讯科技春节访谈，Agent 这一年：沸沸扬扬之后

大模型 agent 热潮年度回望

有一次在湾区一个饭局上，有人半开玩笑地说，去年讨论 Agent 的气氛，像 1999 年谈互联网。那种“历史正在发生”的语气，空气里都带电。

当时大家讲的不是产品，是未来组织结构，是人类的角色转移。有人已经在认真讨论，未来公司的主体可以由一组 Agent 组成，人类只做监督。超级个体与一人公司（OPC）的概念开始映入现实。

我记得当时有个做企业系统的人突然插了一句：“能不能让它先稳定跑一个月再说。”

那句话后来我反复想。曾几何时，也就一两年前吧，agent 还是“五步不过冈”（超过五步的执行链条就无法保证了）。

1 收敛

过去这一年，曾被称为 Agent 元年，Agent 这个词被反复提起，与推理强化一起形成一次范式跃迁。模型突然不只是聊天，它开始“做事”了。能规划，能拆解任务，能调用工具，甚至能自己写代码。那种感觉确实像一个拐点——软件从此不再只是被点击，而是会主动行动。

那时候的语气是高的。多智能体社会、自治系统、AI 员工、数字组织结构重构……讨论的尺度一下子被拉大。AutoGPT、multi-agent、各种自治叙事，像一场技术狂欢。很多人相信，我们正在目睹一个类似移动互联网诞生的瞬间。

但当你把它放进真实环境，兴奋感会迅速被工程细节吞没。真正把这些系统接入生产环境的人，很快发现兴奋背后有另一面。模型会偏航，权限边界模糊，长任务不稳定，成本不可预测。你不知道它什么时候会多想一步，也不知道它什么时候会漏掉关键的一步。它可以写一段漂亮的代码，也可能漏掉一个边界条件；它能跑一个长任务，但中途如果出错，你很难判断问题出在哪里。那种不确定性，不适合放进严肃的工作流里。

最微妙的问题是，它足够聪明，更像人，却不像系统。系统的美在于可预期。人的魅力与软肋在于不可预期。Agent 一开始就自然偏向了它的创造者

2 协议建设

agent方向第一波系统性尝试，其实来自协议，尤其是MCP和A2A。

MCP 想做的事情其实非常雄心——为模型接入工具和数据建立一种统一方式和接口。A2A 更进一步，希望 agent 之间可以跨平台协作。它们背后的愿景非常清晰——如果接口统一，生态自然扩展；如果通信标准化，Agent 才可能真正“组网”。这是为 Agent 时代铺设互联网底层。MCP/A2A 常被类比成 Agent 时代的 TCP/IP。

TCP/IP 统一了互联网时代的网络通信方式，Web 和移动互联网才真正爆发。如果 Agent 之间、模型与工具之间拥有统一协议，生态是否也会在其上自然生长？但TCP/IP 出现时，物理网络已经稳定，通信需求高度一致。而 Agent 面对的是复杂多样的工具体系、权限约束与商业边界。它不是在一张已经铺好的网线上统一协议，而是在一张仍在扩张的认知网络上尝试建立秩序。

可协议从来不是一夜成熟的。版本在变，厂商立场不同，实现也不完全一致。你能感觉到一种谨慎——大家都明白标准的重要，但没有人愿意把命运完全交给还在生长中的规范。

3 架构分层：从场景应用到能力单元

转折并不是某个发布会，而是一种气氛的变化。

一年过去，热闹渐退，Agent 的形态反倒清晰了。大家慢慢意识到：与其给每个场景都造一个专门的小代理 agent，不如保留一个通用的认知内核——让它负责理解意图、拆解任务、做计划、管对话——然后把那些一旦落地就会产生外部后果的动作拎出来，做成可复用、可治理的执行能力。换句话说，Agent 变成一套“认知 + 执行”的组合体：上层允许灵活推理，下层必须可控落地。

于是所谓“架构分层”重新回到台面，这是被现实逼出来的分工，包括认知层，
技能层，连接层，和持续层。LLM作为认知层，天生带着不确定性，擅长想办法、做权衡。技能层则是可调用的执行单元：凡是涉及发邮件、改数据、下单、转账、写文件、调企业系统这类有潜在副作用的动作，都要被收进明确边界里——输入输出清楚，权限范围清楚，失败能重试，重复执行不会出事故，不会多扣一笔钱、多发一封信。连接层负责把这些技能接到外部世界：数据库、SaaS、企业内部系统、浏览器、终端命令行——这些是“手”和“接口”。最后是所谓“持续层”，管“状态与记忆”：任务跑到哪一步了、断点续跑所需的状态、长期记忆与必要的知识缓存，都落在这里。模型不再承担一切，它退回到“决策者”的位置；执行的确定性、合规性、可控性，被系统层接管。

很多人把这个阶段的象征押在 Claude Code 上。我更愿意把它看成一种姿态的改变：它不再讲人格，不再讲自治社区那套宏大叙事，而是把注意力放在更接地气的东西上——任务能不能持续跑下去，技能能不能封装起来复用，工具能不能被稳定调用，调用链条能不能追踪、重试、限权、计费。它把 Agent 从舞台中央拉回到工作台。

在这个过程中，一个旧词重新获得了意义——skills（技能）。

如果回到 Alexa 时代，skill 是规则插件，是在语义能力不足的前提下，对语言理解做垂直补丁。每个 skill 是一个小岛，依赖意图分类与模板匹配，维护独立状态。为了各种不同的问答场景，需要构建千千万万独立的skills，问天气、问股票、问时间等等。

在大模型时代，skill 被重新定义。理解被中心化到模型。skill 不再负责“理解”，它只是技能层中的执行单元——一个可调用、可约束、可审计的 action primitive。连接与状态管理仍由系统层承担。模型负责决策，Skill 负责动作，系统负责边界。

什么叫“可调用、可约束、可审计”呢？或问：API 不也可以被 LLM 调用吗？那 Skill 到底新在哪里？是不是不过把 API 换了个名字？

还是拿具体场景为例。

假设用户说：“帮我分析最近三个月 Tesla 的股价走势，如果有异常波动解释一下，并生成一张图。”

在传统 API 结构里——哪怕是 LLM 参与——通常是这样的：程序员预先写好流程。先调获取数据接口，再调分析接口，最后调绘图接口。LLM 可能只负责填参数。流程是写死的。失败怎么办？整段重跑。出现分支怎么办？提前写好判断逻辑。组合能力存在，但组合顺序在代码里，而不在模型里。

API 是工具，流程属于程序员；Skill 仍然是工具，但流程开始被模型掌握。

系统内部不再只有“接口”，而是有一个技能注册表。获取数据、趋势分析、生成图表、生成解释——这些技能被明确描述、被登记、被纳入一个可见的技能空间。模型在规划阶段生成的是一份抽象计划：先获取数据，再分析趋势，如果波动超过阈值则生成解释，最后生成图表。顺序不再预写，而是在运行时决定。

注意这里的变化：API 时代，组合逻辑写在代码里；Skill 架构下，组合逻辑在模型的规划里。

这不是“API 换皮”，而是控制权的迁移。

再往深一点看。假如系统里有两个趋势分析技能——一个快但粗略，一个慢但精细。在传统结构里，你必须提前决定调用哪个版本。Skill 框架下，模型可以根据对用户提示中关于速度或精度的理解进行选择。技能成为可被比较的对象，而不是固定调用的函数。

再比如失败处理。如果某一步返回异常，调度层可以重试该技能，而不是重跑整个流程。系统可以统计每个技能的成功率、延迟和成本，把这些信号回流到编排里，逐步优化技能组合——说白了，API 时代也能做这些统计，只不过那更多是给运维看的：看服务活没活、慢不慢。到了 skills 这一套，统计开始变成“给调度用的”：它不仅告诉你哪个接口不稳、慢了、错了，还能看清这一步一旦出问题，会把整条任务链路拖成什么样——是局部卡顿，还是连锁失败，还是需要立刻切换备选路径。

这才是 Skill 真正站得住的地方。当然，这套技能级观测与优化的闭环，目前更多存在于领先团队的实践中，还远未成为大规模标准化现实。但结构已经具备，剩下的只是规模与时间。

API 本质上是给程序员用的。Skill 是被模型规划的。前者假设人类写流程。后者假设模型生成流程。一旦组合权从程序员迁移到模型，技能的意义就发生了变化。它不再只是代码库中的函数，而是技能图中的节点。Skill 的价值，不在它比 API 更高级，而在它让“运行时组合”成为可能，同时仍然保持工业边界。理解仍然由大模型承担，执行开始有清晰的约束。这一步，看似保守，其实是工业化。

一个成熟的 skill，至少意味着三件事：输入输出是结构化的（定义了schema）；执行是可重试、可回滚的；权限是隔离的，状态是可审计的。你可以限制它的访问范围，可以记录它的调用链，可以为它计费，可以随时撤销它的权限。这些听上去一点都不性感，却是企业真正关心的东西。

它不像革命，更像基础设施建设。某种意义上，skill 是一种折中，是在标准尚未成熟之前的现实妥协。有一次听一位工程师说：“协议是理想主义，skill 是现实主义。” 就是这个意思。

或许两条路线终会合流。但目前，它们更像不同时间尺度上的试探：一个在设计未来的秩序，一个在支撑当下的落地应用。

4 技能密度

如果只是把 skill 理解为架构收敛，那还是低估了它。真正值得注意的，不是我们如何组织技能，而是技能如何开始形成密度。

过去两年谈大模型，我们几乎离不开参数规模、榜单成绩、推理分数。仿佛模型越强，生态自然跟上。但当模型能力逐渐进入同一量级，分差开始变得细微——97 分与 95 分的差别，很难再决定命运。那时候，问题悄悄换了一个方向：不是谁更聪明，而是谁背后站着更多真实可用的技能。

想象两个认知层几乎等价的模型。一个背后有二十个高质量 skill，另一个背后有两百个。前者能解决二十类问题，后者则可以在这些技能之间自由拼接、叠加、递归组合。二十个技能是工具箱；两百个技能，是图谱。工具箱解决问题，图谱开始创造路径。

技能一旦被模块化，它的价值就不再是线性的，而是网络化的。新增一个技能，不只是多一种用途，而是多出若干种组合可能。密度越高，组合空间越大，系统的“解题维度”也越多。这才是技能密度的真正含义。

移动互联网时代的经验其实早已给过提示。决定平台胜负的，并不是操作系统内核本身，而是应用数量、分发效率、支付体系与开发者活跃度。内核差异存在，但真正形成飞轮的是生态。当基础能力逐渐趋同，竞争自然转向外围的网络结构。Agent 时代未必合适做完全类比，但方向上的相似已然浮现。

于是，关键问题不再是 skills 有多少，而是它们之间能不能流动。能不能被检索？能不能被不同模型规划？能不能跨系统复用？如果技能只是堆在某个平台内部，那只是库存；只有当它们开始彼此连接、彼此调用，密度才会转化为网络效应。到那时，模型反而退到幕后，成为驱动能力网络运转的认知引擎，而不是舞台中央的主角。

这也是为什么协议和 skill 看似分岔，却可能指向同一个终点。协议更像公路标准，skill 像车和货。没有统一标准，技能难以跨域迁移；但没有真实技能，标准也只是空架子。眼下行业更像是先让车跑起来，再慢慢铺路。两条路线不是对立，而是不同节奏下的推进。

最后，那个大家期待的“App Store 时刻”还有多远？

移动互联网真正爆发，是因为分发体系成熟，支付打通，用户规模到位，超级应用出现。Agent 还没有迎来这样的节点。没有大规模的第三方能力市场，没有稳定分发的 skill 商店，也没有形成网络效应的爆款应用。Agent 现在更像移动互联网早期——有 SDK，有开发热情，但还没有形成生态飞轮。

真正的拐点可能不是几个应用的走红，而是一种结构的固化——某些技能节点开始被高频复用，某些组合路径成为默认范式，某个技能图谱逐渐变成事实标准。当技能密度高到一定程度，迁移成本自然升高，生态便悄悄形成壁垒。

垂直行业的爆发似乎一直在“即将发生”。法律、医疗、金融、教育……效率提升在发生，但结构性重塑还没有真正显现。责任边界、监管约束、数据壁垒，这些都比移动互联网复杂得多。

也许 Agent 不会以移动时代的形式爆发。它可能不是一个商店，不是一个下载按钮，不是一个用户主动选择的前台应用。它更可能以skill的形式嵌入既存系统，以后台能力的形式存在。你甚至不会意识到自己在使用 Agent，但系统已经被悄悄重写。

5 memory：任务连续性的保障

memory 可能是这一年最容易被低估的进展。

早期的 Agent 最大的问题，不是不聪明，而是短命。一次对话里很聪明，换一个窗口就失忆。企业环境下，这几乎是致命的。你无法建立长期协作关系，无法积累项目语境，无法形成持续的上下文。所有任务都从零开始，所有协作都像第一次见面。

memory 的加入，不只是为了“更懂用户”，而是为了保障任务连续性。当 Agent 开始记住偏好、约束、历史项目、上下文背景，它才真正从一次性推理工具，变成持续存在的系统。当系统开始“有历史”，它才真正具备组织价值。

但在讨论 memory 之前，需要把几个常被混淆的概念拆开。长上下文、RAG、持久状态，常常被笼统称为“记忆”，但它们其实处在不同层次。

长上下文更像 working memory——它扩展的是模型在当前任务中的注意力范围。窗口越大，模型能在一次推理中考虑的历史越多。但它仍然属于“当下”。一旦任务结束，注意力就消散。

RAG 更像外部存储的检索机制——当模型需要某些信息时，从知识库中调取资料。它解决的是“查阅”的问题，而不是“持续”的问题。它让系统在需要时能找到过去的信息，却并不自动形成时间连续。

真正意义上的 memory，是持久的（persistent）。它至少涉及三层结构。

第一层是任务状态。任务跑到哪一步？哪些子步骤已经完成？是否可以断点续跑？这决定了系统是否具备持续执行能力，而不是每次失败都从头再来。

第二层是长期语境。用户偏好、组织约束、历史项目、权限边界——这些不应在每次对话中重复解释，而应成为系统可更新、可检索、可继承的背景。它减少重复解释的成本，可以在多任务之间共享背景，可以在组织内部形成稳定的协作节奏。

第三层是行为轨迹与决策历史。系统过去在类似场景中选择了什么路径？哪些能力组合更可靠？哪些尝试曾经失败？这已经开始接近一种“经验结构”。不是简单存储信息，而是积累行动模式。

当这三层逐渐成形，Agent 才真正拥有时间持续性。它不再只是一个即时推理引擎，而开始成为持续体。它的价值不再体现在单次回答的聪明程度，而体现在长期协作中的稳定性与积累性。

当然，这条路径仍然早期。长上下文依然昂贵，RAG 仍然粗糙，长期记忆的更新与遗忘机制尚未成熟。更棘手的是，记忆不仅带来效率，也带来风险。错误会不会被固化？偏见会不会被积累？系统是否需要主动遗忘？在持续体的世界里，遗忘和记住往往同样重要。时间既是资产，也是负担。

如果说 skill 解决的是行动边界，技能密度解决的是横向组合，那么 memory 解决的，是持续性。没有持续性，Agent 永远只是聪明的工具；一旦有了时间，它才可能成为组织的一部分。

6 开源大模型的重要性

还有另一条线索，在全球悄悄改变力量结构——那就是中国开源大模型的角色。

过去一年，如果只盯着闭源巨头，很容易忽略开源模型的跃迁速度。千问、Kimi、Step等模型开始频繁出现在开发者真实工作流里。不只是聊天测试，而是跑代码、跑 Agent 任务、跑多模态处理。

阶跃星辰春节前发布的 Step 3.5 Flash，是一个有象征意味的节点。

它的意义不在“参数更多”，而在方向感。它采用了稀疏混合专家（MoE）结构：1960 亿总参数，每次只激活约 110 亿。不是盲目扩张，而是强调效率与结构。

当传统模型用线性注意力硬撑长上下文时，它采用滑动窗口与全局注意力的混合方式。像读推理小说，大部分注意力集中在当前段落，但关键伏笔可以被快速召回。

当逐 token 生成成为默认路径时，它引入多 token 并行预测，提高速度。

这些改变，恰好对应 Agent 时代的核心需求：更长上下文、更低延迟、更稳定的逻辑执行。

Agent 不是聊天机器人。它需要等待工具执行，需要在多轮任务中保持一致性，需要在长上下文下快速响应。

更有象征意义的是，本地部署。

当一个 256K 上下文的模型，可以在 128GB 内存的 MacBook 上运行时，权力结构开始变化。Agent 的“原生大脑”不再完全锁在云端 API 里。开发者可以在终端侧构建私有工作流。这是一种终端平权。

开源在这里变得关键。垂直行业不会轻易把核心流程托付给闭源黑盒。医疗、金融、法律，需要可控、可调优、可部署的基座。

开源模型降低了实验门槛，也降低了创新门槛。很多垂直 Agent 的试验，正发生在这些模型之上。

结语

有时候我会想，这一年真正的变化，不在技术指标上，而在心态上。我们不再问：“它像不像个员工？” 我们开始问：“它能不能长期、稳定、可治理地做事？” 这是一个从幻想走向结构的过程。

协议还在演进。skills 在扩张。memory 在巩固。开源大模型越来越实惠。垂直应用在试水。一切都在进行，时间还不足以让它们马上成熟。

如果说这一年教会我们的是什么，也许是这一点：技术革命往往不是轰然到来，而是慢慢嵌入。当你意识到它已经成为结构的一部分时，它才真正发生。

雾还没有散。但轮廓已经出现。

from 腾讯科技，策划：晓静

多模态大模型输入信号的离散化

量化/离散化并不是为了把 Transformer 的隐藏维度 d 变短；d 是模型容量的选择。它更像是把感知信号先压缩成更紧凑的 token 序列：要么减少 token 数 n，要么减少每个 token 的比特数，从而降低数据、缓存和生成难度；而进入 Transformer 后，仍统一用 d 维表示进行推理与融合。

d （任何token投影成同一个长度 d 的隐藏向量，这是真正的内部token表示，作为网络的处理对象）是大模型训练的一个超参数。并不与token离散还是连续直接相关，虽然连续token的design，会促使研究者倾向于选择更大的 d，好留下/捕捉更多的信息。

一个自然的问题是：如果最终都要投影到同一个固定的隐藏维度 d 里，那我把视觉表示做离散量化，岂不是“重复劳动”？

答案是：不重复。量化/离散化解决的主要不是“d 该多长”，而是另外三件更贵、更要命的事：序列长度 n、比特数/带宽、以及生成端的难度。做个比喻，d 只是“车道宽度”，量化更多是在“减少车流量、压缩货物体积、换一种更容易开车的路”。

量化/离散化并不是为了把 Transformer 的隐藏维度 d 变短；d 是模型容量的选择。它更像是把感知信号先压缩成更紧凑的 token 序列：要么减少 token 数 n，要么减少每个 token 的比特数，从而降低数据、缓存和生成难度；而进入 Transformer 后，仍统一用 d 维表示进行推理与融合。

一、d 固定不等于成本固定：真正的重头往往在 n 和注意力

在 Transformer 里，最敏感的是 token 数 n，因为注意力的开销大体跟 𝑛平方走。

例子：256×256、8×8 patch → 1024 tokens

这时再“固定 d=1024”，仍然要付出 1024×1024 规模的注意力矩阵成本。

而很多离散化方案（尤其是“先编码到更小的潜空间 latent，再离散”）真正干的是：
把 n 从 1024 砍到更小（比如 256、128、甚至更少），这是实打实的减法。

关键点：
离散化经常和“空间下采样/潜空间token化”绑定出现，它省的首要是 n，而不是 d。

二、量化/离散化的“省”，经常发生在 Transformer 之外：数据、缓存、I/O、显存

即便进模型后都变成 d 维向量，离散表示仍然有明显优势，因为它让“流动的东西”从浮点变成整数码int：

数据存储与训练管线：连续 latent 往往是 fp16/fp32 的大块张量；离散 code 是 int（再配合熵编码就更夸张了），数据集体积、读盘带宽、吞吐都会降很多。

中间结果缓存：比如多轮编辑、视频生成、多段推理，缓存离散码（index）比缓存高维连续特征更省得多。

跨模块传输：端侧/服务端/多机之间传中间表示时，离散码天然更省带宽（也更不容易“飘”或“糊”）。

这些开销在真实系统里非常“肉疼”，而且往往比你想象的更早成为瓶颈。

三、离散化还会改变“生成问题”的性质：从回归连续值变成选码本

生成模型最难的一步是什么？很多时候是：
在高维连续空间里生成“看起来像”的东西，既要细节又要稳定。

离散码本（VQ 这类）把生成变成：

先生成一串离散符号（选哪个 code），

再由解码器把符号还原成图像/音频。

这会带来两个常见收益：

学习目标更“像语言”：LLM的自回归/序列建模更顺手。

错误更可控：连续回归的小偏差会导致视觉上糊、飘；离散码的错误更像“选错词”，后处理或自回归本身的纠偏空间更大。

当然它也可能带来副作用（码本过小会导致“积木感/失真”），但这不是“重复劳动”，而是在换一种折中。

提炼一下，总结如下

很多读者第一次听到“把图像也离散成 token”，都会产生一个自然的疑问：既然最后进到 Transformer 里，所有 token 都要被投影成同样长度的隐藏向量 𝑑
（例如 1024），那把视觉表示做离散量化，岂不是重复做功？

关键在于：量化并不是为了把 𝑑 变短。𝑑 是模型容量的选择——就像“这台大脑有多宽的工作台”。量化真正改变的，是另外两件更昂贵的东西：一是序列有多长（token 数
𝑛），二是每个 token 以多少比特在系统里流动（存储与带宽）。

以 256×256 的图片为例，按 8×8 patch 切分就是 1024 个 token，长度已经是“千级”，注意力的开销会随 𝑛 平方放大；这时哪怕 𝑑 固定，计算也依然很重。很多“离散化”方案往往同时在做一件更实惠的事：先把图像编码到更低分辨率的潜空间，再在潜空间里离散化，从而把
𝑛 从 1024 压到更小的量级——这才是省算力的第一刀。

更重要的是，离散 token 在 Transformer 之外也能显著省钱：它让中间表示从高维连续张量（浮点表示）变成整数码（int表示），数据集更小、读写更快、缓存更轻，跨模块传输的带宽压力也更低。换句话说，离散化是在“运输与存储层面”先把货物压缩了；至于进入大模型之后是否用 𝑑 = 1024 来统一表示，那是“工作台宽度”的问题，两者并不矛盾。

所以，把视觉信号离散成 token 不是重复劳动，而是把成本从“又长又重的连续序列”转成“更紧凑、更易搬运的符号序列”，从而让多模态统一建模更接近语言模型那套LLM成熟的工程范式。

The Removal of Autopilot: A Misjudgment of Trust, Pricing Power, and Timing

In recent weeks, Tesla quietly made a structural change to its driver-assistance lineup in North America: new vehicles no longer include the traditional “flagship” Autopilot function—lane centering combined with adaptive cruise control—as a standard feature. Instead, the full experience is now effectively gated behind the expensive FSD subscription.

On paper, this looks like a routine product and pricing adjustment. In reality, the intensity of the user backlash suggests something much deeper was touched.

This is not merely a feature debate. It is a question of trust, pricing boundaries, and the ethics of transition.

Autopilot Was Never “Just a Feature”

For many Tesla owners, Autopilot was not an optional convenience. It was the reason to buy a Tesla in the first place.

Long before Full Self-Driving became a grand vision, Autopilot delivered something tangible:

- Reliable lane keeping
- Competent adaptive following
- Daily, repeatable stress reduction in real driving

It represented Tesla’s earliest and most visible lead over competitors—not in theory, but in practice.

More importantly, Autopilot functioned as a trust generator. It was the psychological bridge that allowed drivers to gradually relinquish control to software.

Without that bridge, the promise of FSD would never have been credible.

Autopilot Was Never Truly “Free”

Much of the public debate rests on a flawed premise:
that Autopilot was a free feature Tesla is now taking away.

Historically, this is not accurate.

For long periods, Autopilot was bundled into the vehicle price by default, with no opt-out option. Customers paid for it implicitly, not optionally.

As a result, removing it from the baseline experience and re-introducing it through subscription feels, to many users, like a disguised price increase—not an upgrade path.

In consumer trust economics, disguised price increases are among the most damaging moves a company can make.

Timing Matters: You Cannot Remove the Base Before Delivering the Replacement

From an engineering perspective, Tesla’s desire to unify its driving stack under FSD is understandable. Maintaining parallel systems is costly and inefficient.

The problem is not the direction—it is the timing.

At this moment:

- FSD remains explicitly labeled as supervised
- Unsupervised autonomy has no public, binding timeline
- Legal responsibility still rests with the human driver

Under these conditions, Autopilot is not legacy baggage.
It is the stable base layer that allows users to tolerate experimentation above it.

Removing that base before a clearly superior, cost-effective, fully accepted alternative exists is perceived as withdrawing safety capital before depositing its replacement.

This is not a technical error.
It is a trust error.

Why Early Adopters Are Especially Angry—Even When Unaffected

One striking aspect of the backlash is that many critics already own FSD and are not personally impacted.

Their reaction is instructive.

Early adopters lived through:

- Autopilot’s formative advantage years
- FSD beta’s chaotic, error-prone experimentation
- Acting as data providers, testers, and tolerance buffers

They accepted risk because the foundation was solid.

The moment that foundation is removed, even symbolically, it signals something unsettling:

If this can be unbundled abruptly,  nothing that exists today is truly safe from re-monetization tomorrow.

That realization triggers defensive outrage—not entitlement.

Tesla’s Perspective Is Rational—But Incomplete

To be fair, Tesla is not acting blindly.

From a corporate standpoint:

- Driving capability is transitioning from a vehicle attribute to a continuously evolving service
- FSD’s endgame involves robotaxis and time monetization
- A free or semi-free Autopilot tier complicates long-term pricing power

Elon Musk has repeatedly stated that FSD pricing will rise as capability increases.

That logic is internally consistent.

But it omits a critical constraint:

You may price the future,
but you cannot pre-emptively withdraw today’s sense of safety
to finance tomorrow’s ambition.

This Is Not a Technology Debate—It Is a Pace Debate

At its core, the disagreement is not about whether autonomous driving will arrive.

Most informed users believe it will.

The disagreement is about how we move through the transition.

For many drivers, the ideal state is not permanent autonomy, but choice:

- Drive when you want
- Delegate when you don’t

Stable Autopilot combined with supervised FSD came closest to that balance.

It was not perfect—but it respected human agency.

Conclusion: The Market Will Respond

This decision will not destroy Tesla.
But it will likely produce measurable consequences:

- Slower adoption among new buyers
- Increased subscription skepticism
- A cooling of community goodwill

Those signals are not punishment. They are feedback.

Great companies are not defined by never making mistakes, but by whether they learn to recalibrate before trust erosion becomes structural.

Tesla still has time to do that.

But only if it recognizes that trust, once unbundled, is far harder to resubscribe.

Autopilot 被剥离: 一次关于信任与定价权的误判

最近，Tesla 在北美市场对其驾驶辅助功能体系进行的一次调整，引发了远超预期的用户反弹：新车不再标配传统意义上的 Autopilot（车道保持 + 自适应巡航），取而代之的是对 FSD 订阅的兜售。

表面上，这是一则“产品线与定价策略调整”的新闻；但从用户反应的烈度来看，这更像是一次对既有信任结构的碰撞。

一、为什么反弹如此强烈？

如果仅从功能角度看，Autopilot 的剥离似乎并不影响车辆的被动安全或基础主动安全指标（例如自动紧急刹车）。但问题恰恰在于：Autopilot 从来不只是一个功能。

对大量特斯拉车主而言，Autopilot 是：

- 决定是否购买特斯拉的关键理由
- 从“人控”走向“机控”的心理过渡层
- 对 FSD 未来愿景产生信任的现实锚点

尤其对早期用户来说，Autopilot 是一个已经被长期验证、每天可用、稳定可靠的系统。
正是这个“可依赖的现在”，支撑了用户对“尚未完成的未来”的耐心。

二、被忽略的事实：Autopilot 并非“免费赠品”

很多争论中存在一个模糊前提：

Autopilot 是不是“原本免费的，现在被拿走了”？

事实是：

Autopilot 并非纯粹免费，而是被隐含计价、打包进整车价格体系中的。

在相当长一段时间里，Autopilot 是默认配置，没有 opt-out 选项。用户并非“没付钱”，而是被动为其付费。

因此，当它被单独拆分、重新进入订阅或付费体系时，许多用户产生的并不是“功能缩水”的情绪，而是更直接的判断：

这是一次变相涨价，以及对于用户体验锚点的无视。

这恰恰是最容易伤害用户信任的商业行为之一。

三、在没有替代方案之前，剥离基座意味着什么？

从工程与产品逻辑上看，特斯拉推动技术栈统一、减少系统分裂，是可以理解的。
但问题在于时序。

在当前阶段：

- FSD 仍被官方明确标注为 supervised
- 无人监管（unsupervised）没有明确时间表
- 法律与责任主体仍然高度依赖人类司机

在这种情况下：

先移除已经成熟、被广泛信任的 Autopilot，而非先交付一个等价或更优的廉价替代体验，本质上是在透支既有信用。

这不是技术问题，而是产品伦理与信任边界的问题。

四、为什么早期用户的愤怒尤为尖锐？

一个耐人寻味的现象是：很多表达愤怒的声音，来自仍然拥有 FSD、甚至并未直接受影响的老用户。

原因并不复杂。

我们这些早期用户经历过：

- Autopilot 明显领先同行的阶段
- FSD beta 千疮百孔、问题频出的阶段
- 作为“技术极客”“小白鼠”，用耐心与数据参与系统演进的阶段

我们之所以愿意忍受早期的不成熟，有一个前提：

基座是稳的，业内领先的。现在几乎所有厂家所有车型，都开始提供某种程度的车道保持与跟车的辅助驾驶，但感觉还是特斯拉的 auto-pilot 最靠谱。

当这个基座被拆解，哪怕自己暂时不受影响，也会本能地意识到：

如果这种做法成立，那么未来任何“既得体验”，都可能被重新定价。

这不是情绪化的抵触，而是对规则被单方面改写的警觉。

五、特斯拉并非“没算清楚账”

必须承认，特斯拉并非不知道风险。

从公司视角看：

- 自动驾驶能力正在从“车辆属性”转向“持续演进的软件服务”
- FSD 的终局是 Robotaxi 与时间货币化
- Autopilot 作为“免费层”，长期支持可能成为技术与定价的阻碍

尤其是在Elon Musk 多次强调 FSD 未来将随着能力提升而涨价的背景下，将驾驶能力整体纳入订阅体系，在商业逻辑上并非不可理解。

但问题在于一句话：

你可以为未来定价，但不能在未来尚未交付之前，就先抽走用户今天的安全感，剥夺用户的选择权。

六、结语：市场终会给出反馈

我并不认为这次调整会“毁掉特斯拉”。但我相信，它会带来一段必要的市场反馈期：

- 新用户的犹豫
- 社区情绪的降温
- 对订阅价值更苛刻的审视

如果特斯拉足够伟大，它终将学会在速度之外，重新尊重节奏。

自动驾驶已经解决了，但我们还没准备好告别驾驶

Full self-driving is a reality, a solved problem — at least for the driver who still wants to drive.

多年来，关于自动驾驶何时能解决，一直存在争论。马斯克最近声称 FSD 是个已经解决的问题。对此，我是基本同意的。

如果一定要给出一个具体的答案，我的结论并不激进，却可能让很多人不舒服：

对个人驾驶体验而言，FSD 已经在事实上完成了。

至少，在 supervised FSD 这一形态下，它已经达到了个体体验意义上的“天花板”。

一、技术标签与真实体验的脱节

从官方定义看，当前的 FSD 仍然被严格标注为 supervised，属于所谓 L2级别。
这意味着：
法律上，司机必须随时准备接管；
责任上，人类仍是最终驾驶主体。

但从真实使用经验出发，这个标签与体验之间已经出现了明显脱节。

在连续数月的日常驾驶与多角度测试中，我实际上不再需要接管车辆（特殊情形不算，例如对它选择的停车位不满，接管停在其他位置）。并不是因为系统“完美无缺”，而是因为它已经稳定到足以让我进入一种持续的 relax 状态——
不再紧张地盯着前方路况，不再把注意力持续锁定在驾驶动作本身。

这不是演示视频，也不是短时测试，而是长期、重复、可复现的日常状态。

二、“完美”并非无瑕，而是边际收益枯竭

当我说 supervised FSD 在体验层面已经“完美”，并不是指它已经白璧无瑕、永不犯错。

我指的是另一件事：

从 99.9% 到 99.99%，甚至 99.999%，对个体驾驶者而言，体验收益已经趋近于零。

对绝大多数个人用户来说，我们的驾驶场景具有天然的限制：

- 活动半径有限（多围绕家庭与固定区域）
- 驾驶时间有限（一天几小时已经非常多）
- 路况分布高度重复

在这样的条件下，“长尾事故率”的持续下降，已经很难被感知。它仍然重要，但不再是体验意义上的跃迁。

三、为什么特斯拉仍然必须继续“卷那几个 9”

这里必须明确区分两个视角：

个体用户的视角
与
系统级部署者的视角

对特斯拉而言，FSD 的目标不是“让某个或某批用户放松”，而是要在 全球范围、数百万乃至上千万辆车 上长期稳定运行。

在这种规模下，任何微小概率事件都会发生，并迅速演化为监管、舆论与公共安全事件。

因此，对特斯拉来说：

- 99.9% 远远不够
- 99.99% 仍然危险

为了最后那几个 9，即便需要成倍提升算力、传感与系统冗余，也在所不惜。这不是偏执，而是规模化系统的宿命。

四、真正的质变，不在“更好”，而在“不再被允许接管”

下一次真正的质变，来自 unsupervised FSD的普及。

那将是一个完全不同的阶段：

- 人类不再被允许接管
- 人类不再是驾驶主体
- 车辆从“辅助系统”变为“自主系统”

这不是体验升级，而是权力结构的切换。但必须诚实地说：这未必是所有驾驶者最向往的状态。

对我与不少人而言，理想的状态并不是“我永远不能碰方向盘”，而是：

想开就开，不想开就交给系统。

在这个意义上，supervised FSD 反而是一种极其珍贵、且注定短暂的理想的平衡态。

五、时间被正式货币化的那一天

从商业角度看，FSD 订阅价格真正大幅上行的锚点，并不在于技术“又好了多少”，而在于 unsupervised FSD 获得监管批准、合法上路的那一刻。

因为在那一刻：

- 时间被正式货币化
- 注意力被正式定价
- 风险被正式从个人转移给系统

当你上车就可以睡觉、工作、娱乐，当车辆成为你的移动客厅或办公室，你节省下来的时间、精力，甚至生命风险，都会被清晰地标上价格。

那时，订阅费不再是“软件费用”，而是 时间与安全的分成机制。

六、一个反直觉的结论

当无人驾驶成为社会默认的出行方式，人类驾驶反而会变成一种昂贵的奢侈品。就像今天骑马一样：不是因为它更高效，而是因为它更“酷”、更稀罕、更昂贵、更有怀旧的奢侈感。

但在那个时代真正到来之前——或许还需要 5 到 10 年的制度过渡期——我们正身处 supervised FSD 的黄金时代：

法律仍然默认是人类驾驶；系统已经足够成熟（越俎代庖为常态）；而个人驾驶权，尚未被剥夺。

这是我从 FSD beta 到 supervised FSD，五年多使用与观察的真实心路历程。

而现在，正是体验意义上的自动驾驶的天花板时刻。尽情享受吧，在我们不得不交出方向盘之前。

一旦真正进入无人驾驶时代，robotaxi 随叫随到，而且会像公共交通一样便宜；那时，几乎没有任何经济理性，再去供养一辆价值数万美元、利用率不到10%、占据生活成本的大头（仅次于房贷）、只为“自己开”的私家车。

If Robotaxi Fails, This Is Where It Will Fail

Robotaxi is often framed as a technical moonshot.
That framing is wrong.

The technology is not the primary risk.

If Robotaxi fails, it will fail for non-technical, system-level reasons.

1. Not Safety—But Perceived Safety

Statistical safety is not the same as social acceptance.

A system can be 10× safer than humans and still fail if:

- Incidents are rare but spectacular
- Media amplification is asymmetric
- Human-caused accidents are normalized, machine-caused ones are not

Robotaxi must overcome salience bias, not just engineering benchmarks.

Insurance backing helps—but perception lags data.

2. Regulatory Latency, Not Regulatory Hostility

Most regulators are not anti-autonomy.
They are anti-liability ambiguity.

Robotaxi fails if:

- Responsibility is unclear across software, fleet operator, and manufacturer
- Incident attribution cannot be cleanly resolved
- Legal frameworks lag operational reality

Progress stalls not at approval, but at scalable approval.

3. Operations, Not Algorithms

The hardest part of Robotaxi is not driving.

It is:

- Fleet maintenance
- Edge-case recovery
- Cleaning, vandalism, misuse
- Geographic scaling without human fallback

Algorithms scale geometrically.
Operations scale linearly—and break under friction.

This is where many promising systems historically collapse.

4. Unit Economics Under Real Load

Robotaxi looks extraordinary in slide decks.

It becomes fragile when:

- Utilization is uneven
- Urban density is lower than modeled
- Insurance, maintenance, and downtime are fully accounted for

If margins depend on perfect conditions, the model will not survive contact with reality.

5. Public Trust Is Path-Dependent

One early, mishandled failure can poison years of progress.

Robotaxi does not get unlimited retries.
Trust, once lost, is slow to rebuild.

This makes early-stage discipline more important than speed.

The Bottom Line

Robotaxi will not fail because autonomy “doesn’t work.”

It will fail if:

- Society cannot agree on liability
- Regulators cannot scale approval
- Operators underestimate real-world friction
- Or trust collapses faster than it can be rebuilt

Technology is necessary—but insufficient.

FSD 会拯救“最不被保险欢迎的人”

关于自动驾驶，有一种普遍但隐蔽的误解：

FSD 是给好司机、理性人、技术精英准备的高阶工具。

这个判断，在风险经济学和保险逻辑面前，不再成立。真实世界发生的，可能恰恰相反。

1. 传统保险失败的，不是“价格”，而是“分层能力”

传统汽车保险的核心能力只有一个：根据“人”的历史行为，对风险进行分层定价。一旦 FSD 开始规模化，这套逻辑会迅速失效：

- 低事故率人群 + FSD → 风险被系统性压缩 → 保费显著下降
- 这些优质用户，会最先离开传统保险池

留下来的是什么？

- 事故率更高
- 行为更不可控

此时，保险公司并不是“经营不善”，而是进入了一个不可逆的反向选择死亡螺旋：

- 提价 → 赶走中间层
- 不提价 → 直接亏损

2. 被传统保险抛弃的人，恰恰最需要 FSD

当传统保险体系开始“挑人”，被挤出去的，并不会是那些自律、谨慎、驾驶能力强的人。被决绝投保的，往往是：

- 年龄偏大、反应慢
- 注意力易分散
- 历史驾驶记录差
- 居住在事故高发区域

在“以人定价”的体系里，他们是不可承保的风险，拖累保险，压缩保险的盈利空间。但在“以系统定价”的体系里，他们反而是改造空间最大的对象。因为 FSD 的逻辑完全不同：

FSD 不关心你是谁，只关心它接管了多少控制权。

一旦控制权被让渡，个人差异会被强行压缩到同一条技术曲线上。这就是那个很多人没看清、但极其重要的事实：

技术面前人人平等，技术红利不挑拣对象。

3. 无人承接，并不等于无人可救

当传统保险拒保或天价定价时，社会并不能“蒸发”这些人。他们依然要出行、要工作、要生活。这时，唯一还能系统性降低他们风险的方式，只剩下一个：

让人退居后台，让FSD上前台。

从系统视角看：

- 把“好司机”变得更安全 → 边际收益有限
- 把“差司机”拉回平均水平 → 边际收益巨大

这意味着一个非常反直觉的演化路径：

FSD 的规模化，并不一定来自技术信仰者，而更可能来自被传统体系放弃的人。

不是选择，而是被迫。

4. 这正是 FSD 会“全民化”的原因

如果 FSD 只在高质量用户中渗透，它永远只是一个高端选配。但一旦它开始：

- 吸纳高风险人群
- 显著降低他们的事故率
- 在统计意义上“抹平人群差异”

它就越来越转化为基础设施。到那时，社会认知会发生反转：

- 不使用 FSD，才是高风险行为
- 人类驾驶，会逐步变成一种需要额外付费、额外审查的“奢侈自由”
- 类似吸烟、极限运动那样，被单独定价、单独监管

5. 一个不太政治正确，但几乎不可避免的结论

如果把这条逻辑推到终点，会得到一个令人不安、但极其现实的判断：

自动驾驶，并不是只解放最好的人，而是先拯救最容易出事的人。

6. 这会加速FSD 普及

“低质用户多了，会不会拖慢 FSD 的社会接受？”

恰恰相反。真实路径更像这样：

1. 传统保险提价或拒保
2. 高风险用户被挤出
3. 唯一可行的降风险手段是技术接管
4. FSD 成为“被迫选择”
5. 事故率显著下降
6. 安全性数据更具说服力
7. 公众与监管态度开始松动

这是一个由成本和风险驱动的强制加速过程。

FSD 的真正护城河，不是好司机的喜爱，而是坏司机的无路可退。

如果 FSD 真的会失败，特斯拉最可能栽在哪里？

在自动驾驶的讨论中，最没价值的反对意见，通常是情绪性的：“我不敢坐”“我看过事故视频”“机器永远不可能像人一样”。

真正值得认真对待的反对意见，只有少数几条，而且每一条都指向系统性风险。

一、最大风险依然是“长尾世界”

即便在 FSD 13/14 阶段，系统已经能覆盖绝大多数日常驾驶分布，但真实世界的难点永远在0.9后的多少个9的长尾场景：

- 极端天气
- 非标准施工路况
- 人类博弈行为（挑衅、误导、违规）
- 区域性交通文化差异

如果这些长尾场景无法被足够快地吸收进训练与部署闭环，那么系统安全性会出现“平台期”，而不是持续拉开差距。

二、Unsupervised 的真正难点，是责任结构而非技术

技术跑通，并不等于社会结构准备好了。

无人监督意味着：

- 事故责任从“人”转移到“系统 / 公司”
- 保险对象从“个人”转移到“平台”
- 法律纠纷从个体事故，升级为系统性风险

如果责任认定、赔付机制、跨州/跨国的法规长期无法趋同，那么 Unsupervised 可能在技术上成立，在制度上被“限速”。

三、工程化与规模化，也是最容易被低估的风险

实验室里表现优秀的系统，和百万级车辆、全天候运行、地点无差别部署，完全不是一个量级的问题。

真正的挑战包括：

- 软件快速迭代与稳定性的张力
- 回滚机制与事故复盘的工业化能力
- 成本曲线是否能支撑大规模普及
- 算力、硬件、供应链是否同步进化

如果工程化能力跟不上，技术优势可能被“消耗”在运维复杂度中。

四、商业模型的反噬风险

订阅与 Robotaxi 的前提是：用户始终相信系统在“持续变得更安全”。

一旦出现长期停滞，哪怕不是倒退：

- 订阅提价会遭遇强烈反弹
- 保险费率可能停止下调甚至回升
- 市场预期可能快速反转

自动驾驶的商业模型，本质上是对未来安全提升的提前定价。如果未来无法兑现，估值会被修正。

结语｜真正的分歧，不在“能不能”，而在“能否持续进步”

所以，自动驾驶真正的分水岭从来不是某一次事故，也不是某一次发布。

而是一个更冷静的问题：

它是否还能在未来 5–10 年里，持续、稳定地拉开与人类驾驶的安全差距？

如果答案是肯定的，那么保险、监管、商业模式都会（被迫）跟上。如果答案是否定的，那么所有故事都会在某个阶段自然淡化，甚至熄火。

Insurance Voted First Why FSD 13 / 14 / 15 May Reprice the Entire Mobility Industry

The most important signal in autonomous driving is not a product launch, a demo video, or even user sentiment.

It is insurance pricing.

When a third-party insurer lowers premiums for vehicles running Full Self-Driving (FSD), it is not making a philosophical statement. It is making a probabilistic bet—with capital at risk—that the accident distribution has structurally changed.

Insurance does not argue.
Insurance does not speculate.
Insurance pays—or bleeds.

And that is why recent premium reductions tied to FSD usage matter far more than most headlines suggest.

This essay argues that what we are witnessing is not a feature upgrade, but a multi-layer phase transition—one that simultaneously cuts across technology, insurance, regulation, and business models.

At the center of this transition are three distinct milestones: FSD 13, 14, and the forthcoming 15.

1. Why Insurance Is the Most Credible Third-Party Signal

Manufacturers can claim safety improvements.
Users can report subjective experiences.
Regulators can hesitate.

Insurance companies cannot afford any of that.

A third-party insurer lowering premiums is effectively saying:

“Based on real-world data, we believe the expected loss curve has shifted—and will continue to shift—in a statistically meaningful way.”

This is qualitatively different from manufacturer-subsidized discounts.
It reflects external actuarial confidence, not internal marketing intent.

In complex socio-technical systems, insurance pricing is often the earliest monetized acknowledgment of risk reduction—long before regulation or public consensus catches up.

That is why insurance frequently moves first.

2. Regulation Is Not First-Principles. Mortality Is.

Autonomous driving debates often stall on “regulatory conservatism.”
But this framing misses the first principle.

The ultimate regulatory objective is safety, and safety is measurable:

- Fatalities per million miles
- Severe injury rates
- Accident frequency distributions

If a system persistently outperforms human drivers on these metrics, regulatory hesitation becomes increasingly difficult to justify—because delay itself begins to carry a measurable human life cost.

Insurance companies, driven purely by loss statistics, respond faster than regulators precisely because they are already optimized around these metrics.

The pattern is predictable:

Insurance reprices risk → adoption increases → data quality improves → social acceptance rises → regulatory pressure mounts → regulatory frameworks adapt

3. FSD 13 / 14 / 15: Not Just Version Numbers

Many observers still frame FSD commercialization as a simple question:
“Are users willing to pay for autonomous driving?”

That question is already outdated.

What is actually happening is far more consequential:
pricing power is quietly migrating.

FSD 13: Establishing the Feasibility of Superior Safety

Before the breakthrough in data-driven, system-level end-to-end training, progress in FSD was fundamentally sawtooth-shaped. Performance regressions were not uncommon, and unresolved issues—such as phantom braking that resisted targeted engineering fixes—undermined user confidence.

As a result, users often disengaged preemptively in moderately complex scenarios, not because the system had failed, but because confidence was fragile. This led to a second-order effect with broader implications: FSD-on safety data lacked credibility in the public eye, because frequent human takeovers made apples-to-apples comparison with human driving inherently difficult or twisted.

FSD 13 marked a decisive technical inflection.

With end-to-end training finally working at the system level, the data flywheel became real. Users broadly experienced a step change in stability and safety. Disengagement rates dropped sharply, and—critically—the resulting safety data became persuasive rather than debatable.

The significance of FSD 13 is this:

It completed the feasibility validation of FSD as a system capable of exceeding human driving safety. Autonomous driving began to behave as a coherent, continuously improving system, benchmarked explicitly against human-level safety—and supported by objective, credible, apples-to-apples data.

At this point, the question shifted from “Does this work?” to “How fast can it compound?”

FSD 14 (Ongoing): Insurance Begins to Recognize the Shift

Roughly a year after FSD 13, FSD 14 achieved full Point-to-Point autonomy—the final mile of actually-"full" driving automation—and reached a safety level approximately seven times better than human driving. A critical transition followed.

For the first time, autonomous driving began to systematically reduce accident rates across real-world, large-scale driving distributions, outperforming human drivers by a clear statistical margin.

This directly triggered a cascade of downstream effects:

Insurance premiums began to decline materially
“Money saved” was more readily reallocated—psychologically—to FSD subscriptions
Subscriptions ceased to feel like discretionary add-ons and instead became the natural price of risk absorbed by the system

This is precisely the point at which insurance and subscriptions entered a positive feedback loop.

Risk reduction started being monetized.

FSD 15 (Unsupervised): From Subscription to Platform Economics to Robotaxi

Once FSD enters the unsupervised stage (sooner than most expected), a true phase transition occurs.

At this point, FSD is no longer merely an advanced driver-assistance system for individual users. It becomes:

- Callable by third parties
- Deployable at fleet scale
- Capable of participating directly in revenue sharing
- Legally upgraded from an L2 label designation to L4

The business model undergoes three simultaneous shifts:

1. Subscription pricing gains upward flexibility, as safety advantages continue to widen
2. Vehicle margins can be compressed or even sacrificed, with hardware reduced to an access point
3. Robotaxi becomes a cash-flow multiplier, combining platform take rates with scale

At that stage, Tesla no longer needs to rely primarily on vehicle manufacturing and sales margins. Instead, it can become a compounding cash engine driven by:

- Near-zero-marginal-cost software subscriptions from end users
- Near-zero-marginal-cost ecosystem licensing and system calls from other automakers
- Its own vertically integrated robotaxi operations

The first two are classic high-margin digital businesses. The third—if production and deployment can scale fast enough—has the potential to price mobility close to public transit while offering on-demand convenience.

If that happens, the mobility market expands dramatically. Private car ownership faces existential pressure, and human driving increasingly resembles a high-risk, high-cost activity rather than a default mode of transport.

In that world, autonomous driving does not merely disrupt transportation.
It reorients the trajectory of modern society itself.

4. Insurance, Subscriptions, and the Feedback Loop

Insurance repricing is not the endpoint. It is the gateway.

As accident risk is absorbed by the system:

- Insurance premiums fall
- Psychological resistance to software subscriptions weakens
- “Savings” are reallocated toward autonomy features

This creates a powerful feedback loop:

Safer systems → lower insurance → higher subscription acceptance → more data → safer systems

At later stages, this loop extends into fleet operations and Robotaxi platforms, where:

- Insurance is pooled
- Marginal safety improvements directly expand margins
- Hardware margins become secondary to software and platform economics

This is how automobiles begin to resemble smartphones: hardware as distribution, software as compounding leverage.

5. The Industry Repricing: From Manufacturing to Risk Operations

Once autonomy scales, automotive competition shifts away from traditional axes:

Old competition

- Powertrains
- Styling
- Brand differentiation

New competition

- Data flywheel efficiency
- Deployment and rollback discipline
- Accident analysis pipelines
- Regulatory negotiation competence
- Long-term operational stability

The central risk is no longer technological capability alone, but engineering maturity at scale.

6. The Single Point of Failure

All of this rests on one assumption:

Autonomous safety continues to improve—consistently, measurably, and durably.

If progress stalls:

- Insurance repricing halts or reverses
- Regulatory momentum slows
- Subscription economics weaken
- Platform valuations compress

Autonomy is, fundamentally, a forward-priced safety claim.

If the future does not deliver, the market will reprice swiftly.

Conclusion: The Most Dangerous Driver Is Still Human

The societal value of autonomous driving is not convenience or novelty.

It is predictability.

Human drivers are not dangerous primarily because they lack skill—but because fatigue, emotion, distraction, and overconfidence cannot be systemically eliminated.

If autonomous systems continue to pull ahead statistically, the moral framing will eventually invert.

The question will no longer be whether machines are safe enough.

It will be why we continue to tolerate humans at the wheel.

Insurance lowering premiums is merely the first bell.

It signals that, quietly and without ceremony, the risk curve has already begun to move.

一个功能养活一家公司的时代结束了

一个功能，已经养不活一家公司

Gong 们的警钟

SaaS 的老建议失效了吗？

从 economy of scale 到 economy of scope

Dashboard 的时代正在衰落

用户基础成了最后的城墙

但 scope 不是臃肿

中小 SaaS 的残酷选择

AI-native SaaS 的新形态

SaaS 不会死，但会分层

最后的判断

一 先把事实说清楚 这次到底泄漏了什么

二 这次泄漏最有价值的部分 不是功能彩蛋 而是它证明了 agent 真正难的不是模型 是 harness

三 Claude Code 暴露出的其实是一套“agent 操作系统”雏形

四 从官方文档反推 Claude Code 的主干工作流 其实已经非常清楚

五 真正前沿的地方 不是单轮工具调用 而是长程任务中的状态管理

六 subagents 的意义不在“多智能体炫技” 而在“把复杂任务拆成互不踩踏的工种”

七 这次泄漏让人真正震撼的一点 是 Claude Code 已经非常“产品化”而非“研究原型”

八 泄漏代码让行业学到的第一课 是 tool 不等于 API skill 也不等于 plugin

九 OS-level harness 才是 agent 规模化的真正门槛

十 对整个 agent 行业来说 这次泄漏最具启发性的宏观结论是什么

十一 对这次泄漏的技术判断 既不用神化 也别低估

十二 最后给一个更直白的结论

A rare x-ray of a frontier coding agent—and why the real story is the harness, not the model

The leak was interesting because it exposed a system, not a demo

The architecture we should really be talking about

Why the most important word here is “harness”

Long-running tasks are where the romance ends and the engineering begins

Tools are not APIs anymore—at least not in the old sense

Multi-agent systems only matter if they improve division of labor

Safety, in practice, means the model does not get to be trusted by default

What the industry should learn from this moment

The bigger picture: agents are becoming a new execution layer for software

The Operating System in the Agentic AI Era

I. The history of operating systems is, at its core, a war over the front door

II. Agents change the way intent is expressed

III. When the agent becomes the default entry point, three things happen to the OS

3.1 UI moves to the second row

3.2 The permission system becomes the core asset

3.3 APIs rise; apps fade

IV. Why big platforms don’t fully open the gates

V. OpenClaw is a preview of an “ungoverned OS”

VI. The real future shape

VII. Final judgement

I. OpenClaw is a structural event.

II. Why “80% of software” gets swallowed

III. The moat is moving

IV. Startups are being rewired

V. Investment logic is being repriced

VI. Local agents are a transitional form

VII. Software won’t disappear. It will become invisible.

VIII. The real watershed isn’t OpenClaw. It’s what it forces us to talk about next.

Closing

一、OpenClaw 不是能力突破，而是权限解锁

二、本地部署制造了“可控幻觉”

三、能力爆炸来自“组合”，不是突破

四、大厂的克制与个人开发者的冒进

五、风险扩散的速度，可能快于治理速度

六、Agent 的真正挑战不是模型安全，而是执行安全

七、软件结构可能重写

八、我们正站在临界点

一、从高到低的六层结构

二、General Agent：入口与调度者

三、Special Agent：任务域专家

四、App：面向人的功能单元

五、Skill：能力声明层

六、Plugin：执行封装层

七、API：底层能力接口

八、谁是新时代的“App”？

九、从 Plugin 到 Skill

一、App Store 本质卖的不是 App，是“入口权”

二、Agent 时代的致命改变：App 不再是入口

三、App Store 不会崩塌，但会“空心化”

1️⃣ UI App 数量减少

2️⃣ 抽成逻辑被挑战

3️⃣ “技能市场”取代“应用市场”

四、真正的冲突：谁掌握默认 Agent？

五、为什么大厂推进 Agent 非常谨慎？

六、最终问题

一先把事实说清楚这次到底泄漏了什么

二这次泄漏最有价值的部分不是功能彩蛋而是它证明了 agent 真正难的不是模型是 harness

四从官方文档反推 Claude Code 的主干工作流其实已经非常清楚

五真正前沿的地方不是单轮工具调用而是长程任务中的状态管理

七这次泄漏让人真正震撼的一点是 Claude Code 已经非常“产品化”而非“研究原型”

八泄漏代码让行业学到的第一课是 tool 不等于 API skill 也不等于 plugin

十对整个 agent 行业来说这次泄漏最具启发性的宏观结论是什么

十一对这次泄漏的技术判断既不用神化也别低估

十二最后给一个更直白的结论

（1）可视化反馈层（Visualization Layer）

（2）审批确认层（Approval Layer）

（3）监控与审计层（Audit Layer）

（4）一个更具体的例子