The Agent Era: The Contemporary Evolution from Chatbots to Digital Agents

Manus is a new AI agent developed by the Chinese startup Monica, claiming to be the world's first fully autonomous AI agent. It's designed to handle complex tasks independently after an initial user prompt, such as sorting résumés, analyzing stock trends, and generating interactive websites. Currently, Manus is in a private testing phase, accessible by invitation only.

Unveiling 2025's Hottest AI Application Form

The recent explosion of Manus claimed as the first generic agent product has brought the AI industry buzzword "agent" to the public's attention, at least effective in educating and inspiring the market. Manus's beta release demos have been impressively powerful, offering a glimpse of what agent technology can truly achieve. Whether Manus represents a genuine breakthrough or merely well-marketed hype, everyone is now curious about the emerging era of large language model agents. But what exactly is an agent?

I. From Co-pilot to Pilot: The Evolution Code of Agents

When ChatGPT exploded onto the scene, humanity realized for the first time that AI could not only answer questions but also do all kinds of knowledge tasks (translation, summarization, writing, you nam´ it) as  your "cyber assistant". Early Copilot-type assistants functioned like diligent interns—obedient and responsive, answering when asked and acting when commanded. Today's Agents have evolved into "digital employees" capable of figuring out solutions to problems independently. They are no longer passive assistants waiting for instructions, but intelligent agents that can autonomously plan, break down tasks, and utilize tools.

    • Copilot mode: You command "write an English email," it generates text and waits for you to confirm or use it
    • Agent mode: You say "resolve the customer complaint within budget x," and it automatically retrieves order data → analyzes the problem → generates a solution → orders compensation gifts within budget → synchronizes the resolution record with your CRM system

This qualitative leap stems from three major technological breakthroughs:

    1. Extended context windows: New LLMs can remember conversations of up to 1 million tokens (equivalent to an entire Harry Potter novel), building continuous working memory
    2. Reasoning engine: Evolution from simple Chain-of-Thought to Tree-of-Thought reasoning, enabling multi-path decision making
    3. Digital limb growth: API calls + RPA (simulating human software operation) + multimodal input/output allowing AI to truly "take action" without human intervention during the process

II. The Seven Weapons of Agents: Beyond Conversational AI

The combat power of today's top Agents comes from a "technical LEGO set" composed of seven core components:

① Search+RAG

    • Real-time capture of the latest information via built-in search: stock quotes, flight status, academic frontiers
    • Connection to enterprise knowledge bases: instant access to employee manuals, product specifications, customer profiles
    • Case study: A medical Agent can simultaneously retrieve the latest clinical guidelines and patient medical history during diagnosis

② Coding Capabilities

    • Automatically writing scripts to process Excel files
    • Transforming into a "digital developer" during debugging
    • Even developing complete applications
    • Impressive demonstration: During testing, a Windsurf Agent independently wrote a webpage with login/payment functionality

③ Software Operation (Computer Use)

    • No API interface? RPA still directly simulates human operations!
    • Operates browsers, Photoshop, and OA systems just like a human would
    • Game-changing scenario: An Agent autonomously completing the entire workflow from flight price comparison → booking → filling expense forms

④ Memory Vault (Vector Database)

    • Permanently remembers your work habits: "Director Wang prefers blue templates for Monday morning meeting PPTs" "Accountant Zhang's reports must retain two decimal places"
    • Localized storage ensures privacy and security

⑤ Multimodal Capabilities

    • Input and output no longer limited to text:
      • Converting voice meetings into visual minutes
      • Transforming data reports into dynamic videos
      • Generating mind maps while listening to podcasts

⑥ Multi-Agent Collaboration: Complex tasks tackled by "intelligent teams"

    • Commander Agent: Formulates battle plans
    • Scout Agent: Monitors data in real-time
    • QA Agent: Cross-validates results
    • Diplomatic Agent: Requests resources from humans

⑦ Planning and Reasoning

    • Breaking down vague instructions like "organize a product launch" into 100+ subtasks
    • Dynamically adjusting plans: When a venue is suddenly canceled, immediately activating Plan B

III. The Bipolar War in the Agent Universe

The agent landscape is currently witnessing a "generalist vs. specialist" showdown:

Generalist Camp

    • Key players: Manus, GPT-5 (? rumored to integrate all capabilities)
    • Advantages: Universal capabilities—coding, designing, project management all in one
    • Potential risks: Vulnerability to disruption by tech giants (for example, GPT-5 or DeepSeek R3 potentially crushing Manus)

Specialist Camp Lineup:

    • Medical Agents: AI doctors capable of examining CT scans, making diagnoses, and writing prescriptions
    • Legal Agents: Generating flawless contracts in three minutes
    • Financial Agents: Trading operators monitoring 37 global exchanges in real-time
    • Moat: Industry know-how + dedicated toolchains creating competitive barriers

IV. Hopes and Concerns in the Agent Era

On the Eve of Breakthrough:

    • Technical infrastructure largely in place (sufficiently long context + mature toolchain)
    • Multimodal large language models filling the final gaps
    • 2025 potentially becoming the true "Year of the Agent"

Undercurrents:

    • Privacy concerns: Agents requiring deep access to user data
    • Ethical dilemmas: Who bears responsibility when an Agent books a hotel without explicit approval?

V. The Future Has Arrived: A New Paradigm of Human-Machine Collaboration

As Agents gradually master three ultimate skills:

Predictive capability: Anticipating your needs in advance ("Rain detected tomorrow, outdoor schedule modified")

Embodiment: Robots infused with "souls" executing physical actions autonomously (Robot + Agent = Robot butler)

Humans are finally entering an era where "the noble speaks but doesn't lift a finger"—humans set goals, while Agents handle all implementation details and solution paths. This quiet efficiency revolution shall be reshaping the rules of the game across every industry.

The only question is: Are you ready to embrace your digital colleague?

 

【相关】

发布者

立委

立委博士,出门问问大模型团队前工程副总裁,聚焦大模型及其AIGC应用。Netbase前首席科学家10年,期间指挥研发了18种语言的理解和应用系统,鲁棒、线速,scale up to 社会媒体大数据,语义落地到舆情挖掘产品,成为美国NLP工业落地的领跑者。Cymfony前研发副总八年,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个小企业创新研究的信息抽取项目(PI for 17 SBIRs)。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理