Jun 27, 2026
Context Rot, Not Model Quality, Is What Slows Your Agent Down
A live fuel gauge for usage, just-in-time file retrieval instead of preloading, model routing, and a review trail after the run — the workflow around the agent, not the agent itself.

The question developers ask about coding agents has quietly changed. It used to be "which agent is best." Now the people doing serious work have mostly stopped asking that, because the answer barely moves the needle on their actual day. What slows them down is not the model's intelligence. It is the workflow around the model: context that rots, spending they cannot see, routing they cannot control, and changes they cannot review afterward. The agent is fine. The plumbing is the bottleneck.
This is a healthier framing, because it points at problems you can solve. You cannot make the model meaningfully smarter from the outside. You can absolutely fix how context is loaded, how usage is surfaced, how models are routed, and how a session is reviewed. Those are engineering problems with engineering answers, and they determine more of the experience than the choice of frontier model does.
Preloading is how context rots
A common instinct is to give the agent everything up front — preload a pile of files with @-references so it "has the context." It backfires. The context window fills with material that is mostly irrelevant to the current step, the signal-to-noise ratio collapses, and the agent's attention degrades. This is context rot: not too little context, but too much of the wrong context, crowding out the parts that matter.
The better pattern is just-in-time retrieval — pull the specific files the current step needs, when it needs them, instead of front-loading the whole project and hoping. This keeps the working context sharp and the relevant material prominent. It is a retrieval-discipline problem, and it is solved in the layer that decides what the agent sees, not in the model that consumes it.
A session needs a fuel gauge
One of the most requested things is also one of the simplest: a live view of usage as the session runs. Developers want a fuel gauge so they do not slam into a limit mid-task, with work half-done and context about to evaporate. Hitting the wall unexpectedly is not just annoying; it can strand you in the middle of a change, which is the worst possible moment to lose the agent.
Token burn also gets mistaken for productivity, when it is often the opposite — a sign the agent is thrashing, not progressing. Surfacing usage live turns an invisible meter into a control: you can see when a session is spending efficiently and when it is burning fuel going nowhere, and you can stop before the wall instead of after it.
Routing is a decision the agent should not make alone
Not every step deserves the expensive model. Cheap steps should run on cheap models; only the hard reasoning should escalate. Yet agents often spawn costly sub-agents by default, ignoring any routing preference, when a deterministic policy — this kind of work goes to that model — would be both cheaper and more predictable. Developers want their cross-provider routing respected, not overridden by an agent that reaches for the most expensive option automatically.
Routing control is leverage. It is the difference between paying premium rates for trivial work and spending the premium only where it changes the outcome. And like context loading, it lives in the harness — the layer that decides which model handles which step — not in any single model's behavior.
The run is not done until it is reviewed
The last gap is the after. A session ends having produced terminal output, a git diff, and test logs scattered across the screen, and reviewing what the agent actually changed means reassembling all of that by hand. People build local report tools precisely because this reconstruction is tedious, and skipping it means merging changes nobody fully reviewed. Recovering a wiped chat history falls in the same bucket — the record of what happened is valuable, and the tooling keeps losing it.
A review trail closes the loop: a consolidated, inspectable record of what the agent did, ready to check before anything merges. Add it to just-in-time context, a live usage gauge, and explicit routing, and the picture is complete — a control plane for the work around the agent. None of it requires a better model. All of it requires taking the workflow as seriously as the intelligence, which is exactly where the real friction has been hiding.