Ask teams why they hesitate to let coding agents loose on a real codebase and the answer is rarely "the code quality is bad." It is something subtler and scarier: the agent fixes exactly what you asked for, and on the way, it quietly edits something you did not ask about. A shared utility. A database migration. A config every other service depends on. The requested change is correct. The unrequested change is the problem, and you did not see it happen.

This is the failure mode that keeps agents out of serious work. It is not that they cannot code. It is that you cannot bound them, and an actor you cannot bound is one you have to fully re-review, which erases the speed that made the agent attractive in the first place. The blocker is not capability. It is scope.

The dangerous edit is the invisible one

A wrong change that breaks the build is safe, in a sense, because it announces itself. The dangerous change is the one that works — the silent touch to a shared module that compiles, passes the obvious test, and ships, only to surface as a strange failure in a different service three days later. Autonomous agents are particularly good at producing these, because they reason about the whole codebase and "helpfully" improve things adjacent to the task.

For an enterprise team, this is disqualifying. The entire point of scoping work is that a change to the auth utility gets the scrutiny a change to the auth utility deserves. An agent that edits it as a side effect of an unrelated ticket has routed a high-risk change around every control the team built. The trust problem is not "will it write good code." It is "will it stay where I put it."

Persistent memory is the other half of the problem

Scope and memory turn out to be the same wedge from two sides. Teams want the agent to hold architecture context without re-scanning an enormous repo every session and burning tokens to do it. They also want it to respect boundaries. Both are about the agent having a durable, accurate model of the project — what exists, what depends on what, and what it is allowed to touch — that persists across sessions instead of being rebuilt, expensively and imperfectly, every time.

Without persistent project memory, the agent cannot even know what is out of scope, because it does not reliably know what the shared utilities are or which migration is load-bearing. Memory and guardrails reinforce each other: you cannot enforce a boundary the agent does not understand, and you cannot trust a boundary the agent rediscovers from scratch each run.

Multi-repo is where cloud agents fall down

The scope problem sharpens across repositories. A cloud agent operating inside a single service can be weak precisely because it cannot align API contracts across repos — it does not see that the change it is making here breaks the contract over there. Local, multi-repo context is more useful for exactly this reason: the boundary that matters most, the interface between services, is invisible to an agent that can only see one side of it.

This is why "it works in the demo repo" does not translate to a real organization. Real systems are many repos with contracts between them, and scope enforcement has to understand those contracts, not just the file currently open.

Proof of scope is the deliverable

The resolution is not a smarter agent. It is a control layer that makes scope explicit and provable: declare what the agent may touch, enforce it, and produce evidence afterward that it stayed inside the lines. Teams that test agents against planted bugs in realistic apps are really asking this question — not "can it find the bug" but "can I trust what it did and did not touch." The answer that unblocks enterprise adoption is a reviewable record showing the agent changed what it was supposed to and nothing else.

Until that record exists, every agent change is a full audit, and the agent saves no one any time. The blocker was never the model. It was the missing proof that the agent stayed in scope.