Go back
Published:
· agents / ambient-agents / architecture

A Mental Model for Ambient Agents

A mental model for ambient agents: a four-part loop wrapped in a three-part human interface, why coding agents have product-market fit and other domains don't, and the two real blockers I keep landing on.

Stylised editorial illustration: a small warm desk lamp glows over an empty wooden desk by an open window, with a soft orange sun rising over distant blue hills outside. Muted palette, painterly texture, before-the-day-begins mood.

Ambient agents aren’t actually a new idea. The pieces have existed for years: cron jobs, webhooks, LLM tool use, durable execution. Engineers have been bolting them together since before anyone had a catchy name for it. What does feel new is the experience of it, software that runs while you’re not watching and only bothers you when it needs to. Harrison Chase at LangChain coined “ambient agents” in a January 2025 post and the term stuck, more or less. It’s worth being precise about what it actually means, because most things marketed as “ambient AI” in 2025 aren’t really this.

Table of contents

Open Table of contents

What it actually is

Chase’s definition is one sentence, which is all a good definition should need. An ambient agent “listens to an event stream and acts on it accordingly, potentially acting on multiple events at a time.” Two things separate it from the chat agents everyone’s already used to: a human didn’t trigger it, and many can run at once.

Ambient doesn’t mean autonomous. Chase is careful about this in the original post and the distinction gets lost constantly. An ambient agent can still stop and ask you something. It can still flag a draft before sending. What makes it ambient is the trigger and the concurrency, not whether you’re involved.

The way I explain it to people outside engineering: a chat agent waits for you, a copilot sits next to you while you work, and an ambient agent acts while you’re at lunch. You moved the trigger from your keyboard to the world.

This is a very old idea

I did some digging into the history of this and found the idea goes back further than most people realise. Mark Weiser at Xerox PARC wrote in 1991 that “the most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” He called it ubiquitous computing. Then in 1998, Eli Zelkha’s team at Palo Alto Ventures coined “ambient intelligence” for a Philips keynote and described environments that were embedded, context-aware, personalised, adaptive, and anticipatory. Read those five properties and you’re reading the spec sheet for an ambient agent, written twenty-seven years before the term existed.

Mark Weiser at Xerox PARC pointing at a wall-sized rear-projection display showing 'PARC Scoreboard News of June 27'. Three colleagues sit on bean bags around a low table, two of them using early tablet devices that pre-date the term. The room is the PARC ubicomp lab in the early 1990s.
Mark Weiser at Xerox PARC in the early 1990s, demonstrating the ubiquitous-computing setup he sketched in Scientific American, September 1991. Reproduced under fair-use editorial citation.

The point isn’t that LangChain is being derivative. The vision genuinely is old. The engineering to build it just finally arrived. Ambient intelligence was a UX dream sitting around waiting for a probabilistic engine, and now we have one.

What actually changed

For most of the past decade, building a general ambient agent in Weiser’s sense was essentially impossible. You could wire events to actions, but anything resembling judgment had to be hardcoded rule by rule. What shifted was five unglamorous things coming together over the last two years, none of which is about raw model capability.

Tool use went from research demo to production primitive. The Model Context Protocol got standardised enough that you can plug an agent into a new system without writing custom glue from scratch. Context windows got long enough to hold real state. Durable execution platforms like Temporal, Inngest, Restate, and LangGraph’s checkpointer solved the unsexy problem of “what happens when this agent has been running six hours and the worker dies.” And inference costs dropped enough that always-on stopped being financially absurd, even if it’s still not exactly cheap.

Honestly, most of the 2023-2024 “agents” wave was just chatbots in a trenchcoat. The unlock for ambient agents was the infrastructure surrounding the model: durable execution, tool standards, cheap inference. That’s what lets you trust a model enough to leave it running unsupervised for hours.

The mental model

An ambient agent is a four-part loop wrapped in a three-part interface.

The loop is the engine. It listens to an event source (a webhook, IMAP, a Slack channel, a cron schedule, a file watcher, a monitoring alert). It thinks, which is the LLM plus tools plus memory. It acts by calling tools that do real things in the world. And it persists by checkpointing state somewhere durable so if the process dies, the next worker picks up where the last one left off.

What makes the loop ambient rather than just automated is the interface. LangChain’s three-pattern taxonomy is the most useful framing I’ve found here. Notify is after-the-fact, for low-stakes reversible actions: the agent did something and is telling you about it. Question means the agent paused mid-run because it needs something from you to continue. Review is for anything high-stakes or irreversible: the agent has prepared an action and needs your approval before it goes.

resume on crash Event source webhook · IMAP · cron Listen Think LLM + tools + memory Act tool calls / side effects Persist checkpoint three-pattern human interface ✉ Notify after the fact, low stakes ❓ Question paused, needs your input 🛑 Review approve before it ships your response A four-step loop (listen → think → act → persist) wrapped in a three-pattern human interface.

A cron job that emails you a daily report isn’t ambient. It’s a scheduled batch job with a nice subject line. A cron job that drafts ten emails, sends the seven safe ones on its own, and routes the three risky ones to you for review? That’s ambient.

One engineering detail that catches people out: idempotency. Durable execution means resume-after-crash, which means any node in your graph might re-run from the top. If your “send email” step isn’t idempotent, the first time a worker gets killed mid-loop you’ll send the same email twice. Most hand-rolled implementations get this wrong and find out the embarrassing way.

Who’s actually shipping this

The framework players are selling you the harness. LangChain and LangGraph coined the category and built the reference implementation. OpenAI ships an Agents SDK and background mode on the Responses API. Anthropic’s Claude Agent SDK, with its headless mode and permission-scoped tool use, is positioned as the harness for unattended runs.

Then there are the coding agents, which are ambient agents whether or not their marketing calls them that. Cursor’s Background Agents, Cognition’s Devin with its Slack-first interface, Google’s Jules (which clones your repo into a cloud VM and hands back a PR), GitHub Copilot Workspace, Claude Code in headless mode. They all listen for an issue or a PR, do the work, and return something a human reviews.

Finally there’s the enterprise tier, where Microsoft Copilot Studio, ServiceNow, Salesforce Agentforce, and a long tail of vendors are stamping “ambient AI” on roadmaps with varying amounts of substance behind it.

Coding agents work dramatically better than anything else in this space, and the reason isn’t that LLMs are especially good at code. Their use case has three properties almost no other domain has: code review as a ready-made human-in-the-loop interface, git revert as built-in reversibility, and a test suite as an unambiguous success metric. Take any of those away and the deployment story gets hard fast, which is exactly what we’re watching happen everywhere else.

Where this pays off, and where it doesn’t

There’s a quick gut-check I run before getting too excited about a use case. Is there an actual event stream here, or am I wrapping a cron job around something that’s really just a chat? Does the workflow actually need probabilistic judgment, or would a rules engine do it better and more predictably? And is the cost of a wrong action low, or does it need to sit behind a review step? Two no’s out of three usually means you wanted a workflow with an LLM call in it, not an ambient agent.

Where it genuinely pays off, in my experience, is two types of work. High-volume repetitive operations like support ticket triage, lead routing, intake form processing, and content moderation, where the volume defeats human attention and the per-action stakes are bounded. And long-tail enrichment work: CRM data quality, document hygiene, finance reconciliation, all the “we should really fix that” tasks nobody ever gets to but which compound badly when left alone.

The Klarna story is worth going through in detail, because people misread it two different ways. In February 2024, Klarna announced its OpenAI-powered support agent handled 2.3 million chats in its first month, equivalent to 700 full-time agents, and projected a $40 million profit improvement for the year. By May 2025, CEO Sebastian Siemiatkowski admitted to Bloomberg that the all-in approach produced “lower quality” service and they started rehiring humans in a freelance arrangement. The narrative hardened into “AI failed, humans won.” Then on the Q3 2025 earnings call, Klarna said the same agent was doing the work of 853 full-time equivalents and had saved roughly $60 million.

Agentless customer service was the wrong goal from the start. The right architecture was the hybrid. The agent handles Tier 1 volume, humans handle everything else, and the handoff between them is actually designed. Klarna didn’t discover that AI doesn’t work. They ran the experiment everyone should have known to skip, learned the obvious lesson, and ended up with more automation than they started with.

Three-event timeline of Klarna's AI customer-service deployment from February 2024 to November 2025: Launch (2.3M chats in month one, ~700 FTE workload, $40M projected profit), Walk-back (May 2025, CEO admits 'lower quality' service, rehires humans), and Hybrid (Q3 2025, 853 FTE-equivalent workload, $60M saved annually).
Klarna's deployment isn't a story about AI failing. It's a story about a company trying "agentless" first, learning the hybrid was the right architecture all along, and ending up with an AI doing more work than at launch.

The two things actually blocking this

Across every conversation I’ve had with engineers actually building ambient agents, I keep arriving at the same two blockers. Neither is model capability, which would have been my guess two years ago.

The first is security, specifically indirect prompt injection via untrusted event sources. In the largest public agent red-teaming competition to date, indirect prompt injection attacks succeeded against 22 frontier AI agents at an average rate of 27.1 percent per session. Within 10 to 100 queries, every single tested agent could be made to violate its deployment policy on essentially every targeted behaviour.

Direct attacks (a user typing something malicious) succeeded 5.7 percent of the time. Indirect attacks, where the malicious content arrives inside an email, web page, or document the agent reads, succeeded nearly five times more often. Ambient agents expand the attack surface from “the user’s prompt” to “every event source the agent listens to.” Every email, every calendar invite, every Slack message, every web page fetched is now a potential injection vector. Sam Altman specifically warned against giving ChatGPT Agent broad email access for this reason. We don’t have defences that hold under serious adversarial pressure, and the offence-defence cycle is moving fast in both directions.

Horizontal bar chart comparing average attack success rates per session against 22 frontier AI agents in the Gray Swan / UK AISI red-team competition. Direct attacks (user-typed prompt) succeed 5.7 percent of the time. Indirect attacks (injected via email, web, or documents) succeed 27.1 percent of the time, nearly five times more often.
Average attack success rate per session against 22 frontier AI agents (Gray Swan / UK AI Security Institute red-team competition, March-April 2025).

The second blocker is trust, which is the cultural version of the security problem. Even when the technology works, people are reluctant to give write-access to a system they can’t watch in real time, and Klarna lived this over eighteen months. Gartner predicts that by 2027, half of all companies that cut customer-service jobs due to AI will rehire staff under different titles. In their October 2025 survey, only 20 percent of customer-service leaders had actually reduced headcount because of AI in the first place. The pattern I keep seeing is: replace humans with AI, watch quality drop in the long tail, quietly rehire, leave the AI on the high-volume work it handles well, land at the hybrid you could have started with.

Both blockers point the same direction. Ambient agents won’t replace chat. The two are going to coexist, and the boundary between them is itself becoming a product decision. Chat stays where you want the model focused and attentive, high-trust interactions where the stakes are too high to fully delegate. Ambient owns the high-volume tail where stakes are bounded and actions are reversible or gated. The teams building the most valuable systems aren’t picking a side. They’re designing the handoff between the two as a first-class concern, with explicit notify-question-review boundaries and an honest accounting of which decisions belong where.

Weiser wanted computers to disappear. We’re now building computers that disappear, hold our credentials, take actions on our behalf, and read attacker-controlled content from event streams they don’t fully trust. That’s three different threat models layered on top of each other, and we don’t have a good answer for any of them yet. The vision is twenty-seven years old and the implementation is finally here. The hard part, making any of this calm, trustworthy, and actually ours, is still ahead.


Further reading


Tagged