How is AI-as-operator different from AI-as-controller (tool-calling)?

In the controller model, the AI sits above the runtime and dispatches disposable tools beneath it through a structured call protocol like MCP over JSON-RPC; humans are outside the execution path, reading transcripts and approving or denying tool calls. In the operator model, the AI is inside the same execution boundary as humans and devices, co-participating in one live state. Tool dispatch is orchestration — call a tool, get a result, the interaction is over. Operator participation is co-participation — a human can reach into the same live shell the AI is using, take over, and hand control back without restarting anything. Different layer, different invariants.

Is the operator model safer than the controller model?

Not inherently — for prompt injection it is, if anything, more exposed. The controller model has a coarse chokepoint: every effect passes through a per-call tool-dispatch gate a policy can inspect and veto. The operator model dissolves that gate, so a subverted operator gets a direct write to shared state, raising the injection blast radius. The operator model does not remove the safety obligation; it relocates it into two substrate-level requirements: per-actor permission envelopes that bound what each operator may do, and a pre-commit gating seam that evaluates an effect before it lands. It also requires a substrate-enforced, non-cooperative halt so a human can always seize control rather than waiting for the agent to yield.

← Back to Blog Part 4 of 7 · The Execution-State Continuity Layer

AI as Operator, Not Controller: The Multi-Actor Execution Model

May 25, 202623 min read

TL;DR

The operator model puts humans, AI agents, devices, and services inside one shared live execution — they attach as operators to a single-homed, ownerless execution state and observe-and-mutate it through the same interface, with asymmetric-but-transferable authority. This is the inverse of the controller model, where the AI sits above the runtime and dispatches disposable tools beneath it. Tool-calling (MCP, JSON-RPC) is the right abstraction for tool dispatch; it is the wrong one for co-participation, where a human can reach into the same live shell the AI is using, take over, and hand control back without restarting anything. The relocation of the AI from on-top to inside the execution boundary is what makes fluid human-in-the-loop, multi-agent coordination, and devices-as-actors possible — and it relocates, rather than removes, the safety obligation the controller’s tool-dispatch gate used to discharge.

There is a mental model baked into almost every AI agent built today, and it is so pervasive that most engineers never notice it is a choice. The model sits on top of the system. It reasons, it decides, and then it reaches down through a tool interface to make things happen. Shells, browsers, file systems, databases — all of them are tools, and the AI is the thing that calls them.

This is a powerful and correct abstraction for a specific job: tool dispatch. Call it the controller model — a model dispatches tools beneath it. It is also the wrong abstraction for a different job — co-participation in live execution, where an AI is not the orchestrator above the machine but one actor among several that operate the same running state. Call that the operator model: actors operate one live execution rather than dispatch tools beneath them — each one an operator, that is, a participant in one live execution. (“Operator” here is the human-factors sense — an actor who operates a live system from inside it — not the Kubernetes Operator pattern, which is itself a controller that reconciles desired state from above. The two are nearly opposite: a Kubernetes operator sits above the system and drives it toward a target; an execution operator sits inside the running state and shares it.) The stance is not new — the human-factors literature named it decades ago, in Sheridan’s supervisory control (1970s–90s) and Horvitz’s mixed-initiative interaction (1999). What is new is not the operator stance but its substrate: a single-homed, identity-bearing live execution object that humans and autonomous agents operate concurrently under uniform mechanics — where supervisory control meant one human supervising one machine, not many heterogeneous actors sharing one coherent OS-level execution state.

This article argues that those are two different architectural layers, that conflating them is the source of a recurring class of design pain, and that the systems converging on the second model are pointing at a category that deserves to be named explicitly.

Two mental models, drawn out

Start with the dominant one. In the controller model (AI-on-top), the model is the controller. Everything below it is a tool it invokes, typically through a structured call protocol. The Model Context Protocol (MCP), introduced by Anthropic, is the cleanest modern expression of this: the model emits structured tool calls over a JSON-RPC transport to servers that expose capabilities, and those servers do the work and return results.

Two mental models side by side: AI-on-top (the model is a controller dispatching tool calls to shell, browser, DB, MCP; humans outside reading transcripts) versus AI-AS-OPERATOR (human, AI, device, service all attached as operators around one shared execution state)

Now the alternative. In the operator model (AI-as-operator), there is a single live execution state — a running process tree, a PTY, open file descriptors, sockets — and multiple actors operate it. A human at a terminal, an AI agent, a second specialized agent, a monitoring service, a device: each is an actor that observes and submits writes to the same state through the same interface, under the same coherence and ordering rules. They share an interface and an addressing model, not an authority level. Many actors may observe concurrently, but write access to the one shared state is serialized and attributed — at any instant one actor holds the write turn, no actor permanently owns it, and that turn moves between actors with transferable authority. The differentiator is not simultaneous writing; it is attribution, transferable authority, and heterogeneous modality over one live state.

The shift is not “the AI got weaker.” The AI is just as capable inside the box as it was on top of it. The shift is where the AI lives relative to the execution boundary — and that single relocation changes what the system can do.

MCP is right — for what it is

It is worth being precise and fair here, because the easy version of this argument is wrong. MCP and tool-calling are not mistakes to be corrected. They are the correct abstraction for connecting a reasoning model to capabilities it does not itself contain. Standardizing tool dispatch over JSON-RPC so that any model can talk to any tool server, without bespoke per-tool glue, is a genuinely good piece of systems design, and it has become an industry standard for exactly that reason.

The point is narrower and more structural: tool dispatch is orchestration, not co-participation. When a model calls a tool, the tool runs, returns a result, and the interaction is over. The model holds the thread of control; the tool is stateless from the model’s point of view and disposable from the system’s. That is precisely what you want for “fetch this row,” “render this page,” “run this command and give me stdout.”

It is not what you want when the question is: can a human reach into the same live shell the AI is using, type a few commands, and hand control back — without restarting anything, without a separate session, without the AI and the human living on two different control paths that have to be reconciled? That question is not about dispatching a tool. It is about two actors sharing one execution state. Different layer, different invariants.

The take-over / hand-back pattern

The cleanest litmus test for which model a system actually implements is the take-over / hand-back pattern.

Picture an agent halfway through a long migration. It has a shell open, environment variables set, a dev server running in the background, a half-applied set of changes on disk. It gets stuck. A human pauses it, types directly into that same live shell — fixes a broken auth token, restarts a wedged process, eyeballs the actual process tree — and then hands control back to the agent, which continues from the real, now-corrected state.

Take-over / hand-back over one continuous live shell: the AI drives at t0, a human attaches and takes over the same live state at t1, then hands back to the AI at t2 — one live state, not a copy

This only works if the human and the AI are operators on one execution state. If the AI owns a separate control path — if its shell is “the AI’s shell” reachable only through its tool interface — then human intervention means tearing down and rebuilding, or running a parallel session and hoping the two states converge. The whole value of the intervention is that both actors write to the same running object.

There is a subtlety the easy telling glosses: a take-over (and the hand-back after it) inherits the full mutable context of the running state — environment, staged commands, open handles — unsanitized. A handoff is therefore a trust-boundary crossing, not a transparent baton pass: the incoming actor’s authority and policy must be re-evaluated against the inherited state rather than assumed from the fact that the prior actor held control.

The industry is visibly converging on exactly this. OpenHands runs its agent against an execution server inside a container and additionally exposes a VS Code server port on that same container, so a human can attach to the live filesystem and terminal buffers when the agent gets stuck. Warp’s Agent Session Sharing publishes an agent session to a relay so that multiple participants — human and agent — can watch the same scrollback and steer the run; notably, it does this through grant-based asymmetric access — the sharer controls who may view versus interact, and edit rights are separately requested and approved — which is precisely transferable authority over one shared session rather than symmetric free-for-all control.

None of these teams set out to write a manifesto about operator participation. They arrived at it because the controller-on-top model could not cleanly answer “let a human and an agent work the same live state.” That convergence is the evidence.

The multi-actor invariants

If you describe the operator model at the level of what must be true rather than how to build it, four invariants fall out. These are the load-bearing properties; the specific mechanisms that satisfy them are an implementation concern (and, in some systems, a patented one — not the subject of this article).

Before the invariants, one concession that keeps the whole argument honest: the operator model is not inherently safer than the controller model — for prompt injection it is, if anything, more exposed. The controller model has a coarse but real chokepoint: every consequential effect passes through a per-call tool-dispatch gate that a policy can inspect and veto. The operator model dissolves that chokepoint — an actor operates the live state directly — so an injected or subverted operator gets a direct write to shared state with no pre-dispatch checkpoint standing in the way. Removing the tool-dispatch gate raises the injection blast radius. The operator model does not eliminate the safety obligation the controller’s gate discharged; it relocates it — from a coarse pre-dispatch gate into two substrate-level requirements: (a) per-actor permission envelopes that bound what each operator may do, and (b) a pre-commit interposition / gating seam that evaluates an effect before it lands. This is a trade the category must pay, not a free win, and the invariants below are written as that bill.

Multi-client steering of one live execution state: several human and AI actors concurrently steering, monitoring, automating, and inspecting the same shared real-time state

Uniform mechanics, per-actor permission envelopes. The substrate mechanics are uniform: coherence, ordering, and attribution treat a command, an edit, or an observation the same way regardless of whether a human or an AI submitted it. There is not one ordering rulebook for “the AI’s tool calls” and another for “the human’s keystrokes” — inputs are inputs at the level of how the state ingests them. But permission and policy are deliberately not flattened. Each actor carries its own authority envelope, and those envelopes may be asymmetric: an autonomous AI may operate under a narrower envelope than a human attached to the same state. Flattening permission across a human and an autonomous agent would be a safety regression, not a virtue. A safe operator model therefore separates the two — uniform mechanics so the state stays coherent, differentiated per-actor permission so authority is not handed out equally to actors that do not warrant it.
Authority is asymmetric but transferable — and preemption must be enforced, not requested. At any moment one actor may hold control — that asymmetry is real and, for safety, necessary. What matters is that control is not welded to any actor: it can be granted, requested, preempted, and handed off over the one shared live state, not by spawning a parallel session. Preemption, though, splits into two very different operations that the easy version of this argument conflates. Orderly handoff is cooperative: the holding actor reaches a yield point and control passes — fine for normal turn-taking. Halt / seize is the safety-critical one, and it cannot be cooperative. If a human can only regain control when the agent chooses to yield, then an agent that is mid-action, looping, or wedged cannot be stopped — and “the human can always take over” becomes a hope, not a guarantee. A safe operator model therefore requires a non-cooperative halt: a seize primitive enforced by the substrate, not contingent on the agent’s cooperation. Preemption that depends on the agent yielding is not preemption. Where the category cannot guarantee an enforced halt, “always” overclaims and should be stated as the conditioned invariant instead. One honesty bounds even the enforced case: an enforced halt is a commit horizon, not a time machine — it bounds an actor’s future writes, not an effect already past the serialization point. On the normal in-host path, “the human seizes” stops the next turn; it does not retroactively un-commit a write the agent already submitted (the same honesty Part 7 gives the partition window, here extended to the normal-path seize) — though that last pre-halt write is still subject to invariant #4’s pre-commit gating seam and is originator-attributed, so the commit horizon bounds un-gated effects, not gated ones, closing the front-run gap where an actor races a destructive write in just ahead of the seize. The enforced halt guarantees no further action by the preempted actor, not the erasure of action already committed.
A single coherent execution state — and therefore a single shared trust boundary. All actors observe and mutate one canonical live state — not per-actor copies that drift and later have to be merged. The take-over / hand-back pattern is impossible the moment you have two states pretending to be one. But the same shared state that makes operator participation possible is also a shared attack surface: environment variables, PATH, aliases, a staged command, an LD_PRELOAD hook are mutable state one actor can write and another then executes under its own authority. That is the classic confused-deputy shape (Hardy 1988, “The Confused Deputy”) — actor A poisons the shared state, actor B acts on it, and the effect runs with B’s permissions. The per-actor authority envelope that bounds each operator is object-capability thinking (the object-capability model, Miller). A safe operator model must therefore evaluate authority at the moment of the acting actor’s input against the then-current state, not once at attach time, because the state B acts on may have been shaped by A. The shared live state is a shared trust boundary, not merely a shared workspace. This is the deepest of the four invariants, and it is where most current architectures quietly fall back to the controller model.
Provenance and attribution per actor — necessary, but detective, not preventive. Even though the state is shared and the mechanics are uniform, every mutation carries the identity of the actor that produced it. You can always answer “who ran this,” “who edited that,” “which agent took this branch” — across humans and machines alike. But attribution is after-the-fact forensics: it tells you who did something once it is done; it does not, by itself, stop anything. A reader should not mistake “we can attribute” for “we can control.” A safe operator model needs attribution and a preventive property — a pre-execution gating / interposition seam that evaluates an actor’s authority against the current state before a mutation commits. Whether the layer provides that gating itself or hands it as an explicit non-invariant to a policy neighbor, it must be named as a requirement rather than assumed to fall out of attribution. Provenance is the detective control; gating is the preventive one, and an operator model that ships only the first is auditable but not safe.

A caveat keeps these invariants honest. Sharing one live execution state does not mean two actors blindly co-typing into one stdin — a PTY is a byte stream, not a mergeable structure, so simultaneous keystrokes produce noise, not coherence. Concurrent inputs are given a defined order and per-actor attribution, but the practical discipline over a single coherent state is turn-taking and explicit handoff. That is exactly why invariant #2 matters: transferable authority is the mechanism that makes one shared state usable by many actors without devolving into garbage.

One thing the invariants deliberately do not promise should be named as the explicit non-invariant it is. Invariant #2 covers the emergency — the enforced halt/seize — but it says nothing about the routine “your turn is next.” The turn-acquisition policy — who is granted the next non-urgent write turn, and whether a waiting operator blocks, queues, or is dropped — is application and policy, not a layer invariant. The layer guarantees ordering and attribution of submitted inputs and an enforced halt; it does not promise that a turn-acquisition contract falls out of “serialized writes.” That contract is handed to a policy neighbor, the same way the reaping threshold and the confused-deputy gating policy are (Part 7). The enforced halt of invariant #2 is, however, outside that policy’s authority: the turn-acquisition neighbor governs only the non-urgent write turn and may not gate, delay, or starve the human-preempt path, so a malicious or buggy turn policy cannot re-cooperative-ize the non-cooperative halt.

State these as a contract and the difference from the controller model becomes mechanical rather than philosophical. The controller model satisfies none of them in the strong sense: rules differ between the model’s tool calls and the human’s out-of-band actions; control is welded to the model and cannot be handed off — a human can only observe, not take the wheel; there is no single shared live state (the tools are disposable); and attribution, if it exists, lives in a transcript rather than in the execution object.

Operator participation is not agents messaging each other

There is a second, easier-to-confuse pattern that must be ruled out: agent-to-agent messaging. Protocols in the A2A / agent-relay family standardize how separate agents pass structured messages — a planner asks an implementer to do work, an implementer asks a reviewer to check it, completion events get routed back. This is valuable and, like MCP, it is the right tool for its job.

But it is message passing between separate agents, each with its own state. The operator model is the opposite topology: multiple actors inside one shared execution state. The distinction is the same one that separates “two processes exchanging RPCs” from “two threads operating on shared memory.” A2A coordinates distinct execution contexts by relaying messages between them. Operator participation has no distinct contexts to relay between — the coordination happens through the state itself, because everyone is operating on the same object. Confusing the two leads to architectures that bolt a message bus between agents and call it collaboration, when what the take-over / hand-back pattern actually needs is a shared substrate.

Message passing (A2A): separate agents with their own states exchanging envelopes — versus shared execution state: all actors bound to one state with no copies to sync

Operator participation is not a shared ledger of commands

There is a third pattern that comes nearest to the operator model in spirit, and so is the most important to rule out: a shared workspace built as a ledger of commands. OpenAI’s publicly-described multi-agent shared-workspace system is the cleanest example — a coordinator agent invokes task agents, humans and agents alike post commands into an append-only, operational-transform command log, and each actor “yields or acts” in response to commands others have posted. Humans participate as peers by posting commands the same way an agent does. On the surface this looks exactly like “AI as a co-equal actor in a shared workspace,” and it is genuinely multi-actor.

But the shared object is a log of intents to apply — operational-transform commands that describe how the workspace should be modified — and the workspace is that command log, reconstructed by applying the commands in order. The operator model is the opposite construction: actors do not post commands describing changes to a ledger; they mutate one live OS execution state — the process tree, the PTY, the file descriptors, the sockets — in place, through one attach interface. There is no command ledger to append to, no operational-transform replay reconstructing the state, and no “yield or act” coordinator brokering turns. This is the same architectural fault line Part 3 drew against replay and Part 6 draws against CRDT/OT: a ledger-of-commands belongs to the document/operational-transform/replay family, where the record of what to do is the artifact; the operator model is live-shared-OS-state, where the running execution object is the thing itself, not a projection of a log over it.

Why this matters now

Three forces make the operator model not just elegant but necessary.

Human-in-the-loop at scale. As agents run for hours and take hundreds of consequential actions, “approve every tool call” does not scale and “let it run unattended” is reckless. The workable middle is fluid intervention — drop into the live state when something looks wrong, fix it in place, step back out. That requires operators on the live state, not a controller you can only observe through a transcript.

Multi-agent plus human coordination. Once you have a planning agent, an implementation agent, a verification agent, and a human reviewer, the controller-on-top model has no natural seat for everyone. Who is on top? The honest answer is that control is a role that moves between actors, not a fixed hierarchy — a human, a planner, an implementer can each hold the wheel at different moments, over one shared body of work, with a human able to take it back at any time. The shared-state model has a seat for each of them by construction.

Devices and services as first-class actors. A sensor that streams readings into the state, a deployment service that mutates it on a webhook, a phone that reattaches to a session the laptop started — these are not “tools the AI calls.” They are participants with their own initiative, observing and mutating the same execution state on their own schedule. The controller model has nowhere to put an actor that acts without being called. The operator model treats it as just another actor.

The historical line points here

The lineage (traced in full in the first article) is consistent if you read it as a slow migration of execution state out from under any single client — tmate pushing attachments across the network through a relay, Jupyter decoupling a live kernel so any client could attach to the same in-memory state. Agent runtimes then added autonomous actors to that picture — and immediately discovered they needed humans to be able to reach into the same live state the agent was using.

Each step loosened the assumption that one client owns the execution. The operator model is what you get when you finish the job: the execution state is the durable object, and every participant — human, AI, device, service — is a client that attaches to it through the same interface and addressing model — uniform mechanics, asymmetric (transferable) authority. Equal access to the interface is not equal authority: actors share how they observe and command the state, while their permission envelopes remain per-actor and may be asymmetric.

The distinction, drawn at the execution boundary

The edge this article draws sits at the position the AI occupies:

Operator model ≠ controller model. MCP-style tool dispatch (the controller model) puts the model above the tools and treats them as disposable. The operator model puts the AI inside the same execution model as humans and devices, sharing the same observe-and-command interface and coherence rules — equality of mechanism and addressing, explicitly not equality of authority or permission — with per-actor (and possibly asymmetric) permission envelopes, asymmetric but transferable control (any actor, including a human, can take over and hand back — backed, for safety, by a substrate-enforced halt rather than the agent’s cooperation), serialized and attributed writes, and per-actor provenance over a single coherent live state.

It is a distinction about position relative to the execution boundary, and everything else — fluid human-in-the-loop, multi-agent coordination, devices as actors — follows from getting that position right.

Operator participation is what the convergence is reaching for

The systems drifting toward this — OpenHands’ attach-to-the-container intervention, Warp’s collaborative agent sessions — are all reaching for the same architectural object: a command-operator execution layer in which humans, AI agents, devices, and services attach to one running execution as operators — a single, ownerless state single-homed on one host, with the operators distributed around it — rather than a stack with the AI on top calling tools below.

cmdop (cmdop.com ) is built explicitly on this operator model: a persistent execution state that humans and AI agents operate under uniform mechanics with per-actor provenance and per-actor permission envelopes, where control is asymmetric but transferable, so that taking over a run and handing it back is a first-class operation rather than a workaround. The category’s safety obligation — the invariant that earns the word “always” — is to make a human’s preemption enforceable: an orderly handoff for normal turn-taking, and a non-cooperative halt/seize the substrate guarantees rather than the agent grants. Where the substrate enforces that halt, a human can take control back; preemption that depends on the agent yielding is not preemption. So “a human can take over” is a property the layer enforces, not a hope that the agent yields. It is offered here not as the point of the argument but as one reference implementation of the category the industry is converging on.

The controller-on-top model gave us tool-calling, and tool-calling is here to stay. But the question that defines the next layer is not “what can the AI call?” It is “what can the AI participate in — as one actor among many, on the same live state, under the same rules as everyone else?”

See it in the product: the operator model in cmdop — AI as operator (asymmetric, transferable authority) and the operator execution identity shared across every interface.