AI Reflections on 2025

When Agents Take Over the Entry Points: Search Moves to the Background, and Your Phone and Car Start “Doing Things for You”

In 2025, multimodal models and agents pushed AI from “answering questions” to “getting things done.” Suno v5 made it possible for ordinary people to reliably produce loop-worthy music; Sora 2 moved video generation from isolated clips toward storyboarded narratives with consistency; and widespread day-to-day use of FSD makes the safety conversation feel closer to apples-to-apples. A deeper shift is happening at the entry points: Google/OpenAI/Perplexity are driving search toward LLM-native experiences; the Doubao phone experience makes OS-level GUI agents tangible; and Tesla’s in-car Grok signals a transition from rigid voice commands to a true guide agent. The contrast between rapidly improving capability and slow commercial closure remains stark—but the bottleneck is moving from “model capability” to “system-level deployment.”

I’ve been in AI/NLP for a long time—embarrassingly long, perhaps fossil-level. But what shocked me most in 2025 wasn’t that “benchmarks got better again.” It was something more concrete: I repeatedly confirmed in everyday life that high-bar capabilities—things that used to require expert training and experience—are being productized, democratized, and then distributed at scale to ordinary users.

If we pull “AGI” down from metaphysics into an operational definition—the ability to stably match or approach human-expert output across multiple task domains—then 2025 is the first year I genuinely felt “partial AGI scenarios” landing in the real world: music creation, short-form video creation, autonomous driving, and a deeper entry-point revolution (search, phone interaction, in-car voice) all resonating at the same time.

As 2025 closes, I want to connect these seemingly separate shifts with one unified framework—and add a few personal “micro-cases,” so the argument doesn’t stop at industry buzzwords.

0) From “Information Entry Points” to “Action Entry Points”: Agents Are Eating the GUI

What truly happened in 2025 is a migration in how we enter and operate systems:

The old internet was GUI-driven: click buttons, open apps, hunt menus, click links.
More and more critical scenarios are becoming intent-driven: you say what you want; the system decomposes the task, calls tools, produces results, and continues executing.

If you break this into an “agent stack,” you’ll notice that music, video, driving, search, phone, and in-car experiences are all converging on the same engineering blueprint:

Intent capture: natural language / voice / image becomes the primary interface
Planning & decomposition: turn goals into executable multi-step tasks
Tool use: call APIs—or directly operate across apps via the GUI (a GUI agent)
Memory & personalization: continuously learn preferences, habits, and context
Verification & governance: results are checkable, traceable, and risk-controlled

Once you see this, you realize: whoever owns the entry point rewrites the ecosystem. “Search moving to the background,” “the app economy being redefined,” and “in-car assistants evolving into guide agents” are not three different stories—they are three battlefields of the same story.

1) The Democratization of Music: Suno v5 Lets Even the “Musically Illiterate” Produce Loop-Worthy Songs Reliably

Suno’s version timeline mirrors 2025: v4 (2024.11) → v4.5 (2025.05) → v5 (2025.09). The company framed v5 as its most important technical leap—more coherent structure, more professional transitions, more natural audio and vocals—and rolled it out to Pro/Premier in late September 2025.

My micro-case: The “car-loop” bar gets broken

My personal metric is extremely simple: can it loop in the car without getting annoying? Not just self-indulgence—the loop has to survive real-world validation (starting with friends, family, and followers).

In 2025, with surprisingly little effort, I used Suno to create multiple loopable pop ballads / melancholic tracks. One of them is a breakup scene at “the last train, the platform”—that style of writing you probably know: catchy but not overstimulating, repetitive but not mechanical, sadness like steady, continuous drizzle. The key wasn’t “producing a melody.” It was that I could use plain language to tune the structure into a stable region—narrative density in the verses, emotional lift in the chorus, and the turning force of the bridge—iterating until it stayed listenable on repeat.

Historically, that was almost unthinkable for non-musicians. And even among experienced composers, truly loop-worthy—and widely loved—songs are the exception, not the norm.

The real meaning of music democratization isn’t “everyone becomes a musician.” It’s that ordinary people can reliably acquire a meaningful slice of a musician’s productivity—and experience the joy of creation and acceptance.

2) The Democratization of Video: Sora 2 Turns “Imagination” into “Deliverable Footage”

OpenAI released Sora 2 on September 30, 2025, positioning it as a flagship model with greater controllability, realism, stronger physical plausibility, and synchronized generation for video and audio (dialogue and sound). It also promoted an LLM-native “TikTok-like” creation flow via the Sora app and sora.com, along with social/entertainment community building.

My micro-case: From “cool clips” to storyboarded narrative production

I tried creating a short piece: an “80s campus romance.” Not a single flashy shot, but a storyboard organized into multiple scenes—opening mood, the first encounter, emotional progression, conflict and pause, and a final look-back. In early video models, the biggest pain point was character consistency and task consistency: you could easily get beautiful shots, but it was hard to carry the same person reliably across scenes.

In 2025, I felt—for the first time—that this started becoming “engineering-solvable.” You can treat it as a production pipeline rather than gambling on luck. Sora 2’s productization (clonable digital actors, controllable rendering, tooling, and distribution entry points) is pushing that transformation.

Short-form video’s red ocean gets reshaped not because AI merely boosts efficiency, but because it expands the deliverable boundary of imagination.

3) The Democratization of Autonomous Driving: FSD v14 and a Turning Point in Comparability

To me, the release of Tesla FSD v14 marks autonomous driving as effectively a “solved problem” in the sense that ordinary users can verify the experience for themselves. Since the end-to-end breakthrough with FSD v12 more than a year ago, the system has shown accelerating improvement, culminating in v14. My own experience—and feedback from many former skeptics after trying it—points in the same direction: FSD is at or above seasoned-driver level, not only smooth but also (per Tesla’s own statistics) significantly safer than average human driving (Tesla’s latest narrative often cites figures like “7× fewer accidents”). Tesla also describes “unsupervised” driving as beginning limited deployments in parts of Texas, and positions the Cybercab—purpose-built for robotaxi service—as scaling production in 2026: no steering wheel, no pedals, relying entirely on FSD, with extremely rare edge cases potentially requiring remote intervention (on the order of once per ~10,000 miles at most).

My micro-case: “Near-100% FSD usage” creates an apples-to-apples intuition

I used to be a skeptic—at least a sympathetic skeptic. The reason is straightforward: Autopilot-era statistics are easy to challenge. In complex, risky, or ambiguous situations, humans are more likely to disengage the system and take over, making “it’s safer when enabled” look better than it truly is—a potentially misleading selection bias.

But today, the driving behavior of many owners (including me) has changed structurally. In the first few days after getting a new car, I used FSD for almost 100% of my mileage—except small maneuvers like intervening when I didn’t like the parking spot FSD chose and explicitly directing it to a specific space. Under this “coverage near saturation” usage pattern, selection bias shrinks significantly—at least in the sense of the same driver, same car, same living radius, which feels much closer to apples-to-apples. That’s why when Tesla continues to use “fewer collisions than the U.S. average (e.g., 7×)” as part of its narrative, my intuition about comparability is stronger than it used to be—while I fully acknowledge that methodological controversy in society still exists, and should exist.

The shift in 2025 isn’t that “the controversy disappears.” It’s that changes in user behavior begin to weaken the most central controversy point (selection bias). The discussion naturally moves from “can it?” to “where are the boundaries, and how do we govern it safely?”

4) Search Is Destined to Be Rebuilt: The Old Keyword-Auction World Starts to Loosen

“Search moving to the background” is not about dismissing search; it’s about search evolving from “link retrieval” to an entry point that fuses answers with action.

Three threads converged in 2025:

Google’s self-revolution: expanding AI Overviews and introducing experimental AI Mode—explicitly pushing search toward a more conversational, integrated “Q&A + reasoning + browsing” experience. They even introduced more personalized result rendering and call it “generative UI.”
OpenAI entering the core search/browsing arena: ChatGPT Search was updated on Feb 5, 2025 to be available in supported regions to everyone without sign-in, and the company kept accelerating productization and capability.
Perplexity capturing “answer engine” mindshare while triggering copyright conflict: late-2025 lawsuits (e.g., the Chicago Tribune case) put the impact of LLM-native search on content ecosystems and traffic allocation directly under the spotlight.

What truly destabilizes the old business model is this: when users stop “clicking links” and instead get a first-screen AI result that they can keep interrogating and can directly act on, keyword auctions won’t vanish overnight—but they will be forced to migrate into new placements and new attribution logic. Google itself is actively discussing marketing and reach strategies under AI Mode / AI Overviews, which is essentially a self-cannibalizing transformation of the business model.

5) Phone Use: Doubao Makes OS-Level GUI Agents Tangible—and It Feels Like a One-Way Road

The “phone use revolution” matters because it touches the foundation of the app economy.

In early December 2025, ByteDance released the “Doubao Phone Assistant (technical preview)” and partnered with phone makers, describing it as an OS-level capability: it can see the screen, use apps, and execute cross-app tasks (organize files, fill forms, recommend restaurants, etc.). In essence, it upgrades “operating a phone” from click workflows to an intent-driven interactive GUI agent. Voice is the most natural interface for humans—our devices should not be the exception.

My micro-case: Why it feels irreversible

The shock of an OS-level agent isn’t “yet another assistant.” It’s that it changes the behavioral economics. Once you’ve experienced “say one sentence and it runs dozens of clicks for you,” it’s hard to tolerate the old mode. Apps stop being “products directly operated by users” and start looking more like backend services invoked by agents.

Once that path is real, competition in the phone industry shifts from “hardware specs + app ecosystem” to “OS-level agent capability + tool authorization frameworks + safety governance.”

6) In-Car Voice: From “Artificial Stupidity” to a Real-Time Guide Agent

Historically, in-car assistants had one fatal flaw: they could only execute single, rigid commands—no multi-turn dialogue, little context, and no “dynamic navigation and explanation.” Many people call them “artificial stupidity.”

Tesla’s 2025 Holiday Update indicates that Grok can add/edit navigation destinations, handle multiple destinations in a single instruction, support multi-turn interaction, and dynamically update the route.

This may sound small, but it’s a big signal: in-car voice is evolving from a one-command “menu remote” into a guide agent—it can understand intent, sustain dialogue, and revise plans in real time. Combine it with FSD, and you get a sharper thesis:

FSD answers: how the car drives.
The in-car voice agent answers: where you’re going, why, and what you can do along the way.
Together, they form the early shape of a complete mobility agent.

7) Coding Agents and Deep Research: The White-Collar Skill Crash Test Zone

By late 2025, coding agents covering junior-engineer throughput and deep-research systems approaching or exceeding senior analyst output and quality are no longer a question of “can it?” The real question is: how do you integrate it, validate it, and assign accountability to maximize returns.

Their common thread with search/phone/in-car shifts is still the same framework: from information to action, from tool to workflow.

8) The Most Painful Contrast: Capability Explodes, but Commercial Closure Is Still Missing at Scale

In 2025, MIT-related research was widely cited in the media as “95% of GenAI pilots failed to produce measurable ROI,” with the causes pointing to integration, priority mismatch, and organizational friction.

1) Enterprise context is genuinely hard: RAG alone won’t save you

Enterprise context is not a knowledge base—it’s a living system: permissions, responsibility chains, exception processes, inconsistent definitions, legacy systems, and tacit rules. Many failures aren’t “the model can’t answer,” but “the system doesn’t know which path to follow, which interface to call, or who must sign off.”

2) Organizational friction is equally hard—and often more decisive than tech

Even if context can be engineered, enterprises may still resist letting agents take over: compliance/security/procurement/legal/IT gates can grind demos into molasses. Add KPI risk, reputational risk, and dependence on legacy workflows.

So the confusion remains: is context engineering still insufficient, or is it organizational inertia and human barriers? The more accurate answer now is: both, compounded. And in the next few years, the latter (organizational friction) may be even more decisive for deployment speed.

My own view: commercial closure won’t arrive simply by waiting for stronger models. It requires turning agents into deliverable systems through three conditions:

Verifiable (metrics and replay)
Accountable (responsibility chains and permissions)
Embeddable (fits into existing workflows)

Closing: 2025 Is the Eve of AGI

AGI won’t arrive as a single thunderclap. It arrives when entry points are taken over by agents—and suddenly the old way of operating the world feels clumsy, expensive, and unsustainable.

What I see as deterministic after 2025:

Search is moving from links to answers-and-action, loosening the old traffic allocation logic.
Phones are moving from a collection of apps to OS-level agent orchestration; apps are redefined as callable services.
In-car systems are moving from single command execution to dynamic guide agents, and together with autonomous driving they form the early outline of a mobility agent.

Once these entry points solidify, even the question “Why is AI so hard to commercialize?” must be reframed: it’s not “do you have a model,” but “do you have a system.” Not “can you build it,” but “can you own responsibility, validate outcomes, and embed at scale.”