Jun 20, 2026
Where Does Your Code Actually Go? Data Boundaries for AI Coding
A thread titled 'I think the agent just generated proprietary code for our core product' is the new normal. Here's how to reason about what leaves your machine — and keep the line where you want it.

One of the most upvoted recent threads in r/cursor has a title that should make any engineering lead pause:
I think the agent just generated proprietary code for our core product.
Read it as: I think our core product's code just went to a hosted model, and I'm not sure what happens to it now. Over in r/LocalLLaMA the same anxiety shows up from the other direction — long arguments about whether Claude Code and Codex sessions become training data, and what it means that a handful of vendors sit in the path of nearly everyone's coding.
Strip away the drama and there's a real, answerable engineering question underneath: what, exactly, leaves your machine when you use an AI coding tool — and can you move that line?
The boundary moved and nobody redrew it
For years the security boundary around source code was simple and physical. Code lived in your repo, on your laptop, on your CI runners. It left only when you pushed it somewhere you chose.
Agentic coding tools quietly redrew that boundary and didn't tell anyone where the new one is. To answer a prompt, an agent gathers context: the file you're in, files it imports, files it greps, sometimes a broad sweep of the repo. That context — your proprietary code — gets sent to a model endpoint to produce a response. Most of the time that's exactly what you want. The problem is that the scope of what gets sent is opaque, automatic, and far larger than the single function you were thinking about.
The "I think the agent generated proprietary code" moment is really the moment someone realizes they never actually knew where the boundary was.
The three questions worth answering
You don't need to panic, and you don't need to ban AI tooling. You need to be able to answer three concrete questions for whatever you're running:
- What context leaves the machine per request? Just the open file, or a wide retrieval sweep across the repo? The blast radius of a single prompt is the thing most people never check.
- What's the retention and training policy on the receiving end? Is input stored? For how long? Is there a real, verifiable opt-out from training, or just a checkbox? Business and enterprise tiers usually have stronger guarantees than consumer ones — but only if you're actually on them.
- Does this particular work even need a frontier model? A rename, a regex, a boilerplate test, a quick explanation of a stack trace — a competent local model handles plenty of day-to-day tasks. The frontier API is worth reaching for on the hard 20%, not the routine 80%.
Answer those and "is my code safe?" stops being a vague dread and becomes a policy you can actually set.
Draw the boundary on purpose
The fix isn't a tool, it's a posture: decide where your data boundary is before the agent decides for you.
- Tier your work by sensitivity. Core IP and anything under NDA or compliance is one tier; throwaway scripts and open-source side projects are another. They don't deserve the same routing, and treating them identically is how accidents happen.
- Route the sensitive tier to a local or self-hosted model. When the context never leaves the machine, the retention and training questions become moot — there's nothing on the other end to retain anything. Local models have gotten good enough that this is a real option for a large share of routine work, not a sacrifice.
- Reserve cloud frontier models for the hard problems, on an account tier whose retention and training terms you've actually read — and ideally with the scope of sent context kept tight.
- Make the boundary visible. The worst setups are the ones where nobody can say what leaves the machine. Knowing is half the control.
How 1DevTool fits
1DevTool is built around the assumption that you decide where your code goes, not the tool. It runs as your workspace and terminal, so the agent layer lives where your code already is — and it's model-agnostic by design: point routine, sensitive work at a local model and reach for a cloud frontier model only when the task genuinely warrants it. The boundary becomes a choice you make per task, instead of a default someone else set for you.
The deeper principle is the same one driving the move toward local-first AI memory: for the data you actually care about, the strongest privacy guarantee isn't a policy promise on a vendor's server — it's the data never leaving your machine in the first place.
The takeaway
"I think the agent just sent our core product to a cloud model" is not a freak event. It's the predictable result of a boundary that moved while everyone's mental model stayed put. You don't fix it by swearing off AI tooling — you fix it by knowing what leaves the machine, deciding what's allowed to, and routing the rest to a model that runs where your code already lives. Draw the line yourself, on purpose. The default won't.