We’ve reached a point where we aren’t just writing code anymore; we are managing the behavior of systems that write it for us.
In VS Code, this shift is physical. You have Ask for explanation, Plan for intent, and Agent for action. Most teams treat these as interchangeable entry points, which is a mistake. These are distinct levels of autonomy, and each carries a different profile of risk.
The problem is that there is no built-in arbitration between them. Copilot looks at your copilot-instructions.md as a suggestion; agents operate with their own internal prompts and permissions. The “Plan” phase often sits in a vacuum before being handed off to execution. They don’t reconcile; they drift. They can, and frequently do, disagree. If you don’t build a system to resolve those conflicts, the system won’t do it for you.
Our mental model is simple: Ask talks, Plan thinks, Agent acts. Control must live where the system actually changes.
The /delegate Protocol
The most common failure point is the “lossy” transition between a Plan and an Agent. To fix this, we moved to a formal /delegate step.
We think of it as a contractual handshake, much like the old days when teams met to discuss APIs as formal contracts between software domains. When we delegate a plan to an agent, it isn’t a loose goal; it’s a bounded execution. The agent is restricted to the steps in that specific plan. If it hits an edge case, the protocol is to stop and request a “Plan Amendment” rather than improvising. This creates a hard audit trail between our intent and the machine’s output.
We started by externalizing our core rules. Instead of letting behavior emerge from random prompts by our engineering team, we created a single source of truth for architecture, data handling, and compliance. Both the Copilot chat and the agents reference these documents. To be clear, the goal isn’t “better prompting”; it’s a unified governance layer that the AI cannot ignore.
The “AI Layer” Technical Debt
Even with this structure, we hit a human wall. Instructions influence behavior, but they don’t enforce it. We had to implement a real decision layer: type checks, schema validation, and CI gates.
But there’s a psychological hurdle here. Engineers naturally want to “just fix the code” when an agent misses a detail. It feels faster. From a leadership perspective, however, that is a trap. It creates Technical Debt in the AI Layer. If you solve a ticket manually but ignore the flawed instruction that led the agent astray, you’ve made a local optimization at the cost of global efficiency. You’ve fixed the symptom but left the machine broken. We had to retool our team to be orchestrators rather than just craftsmen; our job now is to be architects of autonomous behavior.
Making Intent Visible
We’ve made capturing these interactions a mandatory part of the workflow. Using “Export Chat” in VS Code to drop Markdown into a PR.
When an agent handles a PR, the session view, showing the sequence of decisions and test runs, becomes a replay of its logic. For a reviewer, this is a game-changer. Instead of reverse-engineering a massive diff, you review the Plan-to-Code diff. Does the implementation honor the /delegate contract?
For leadership, these artifacts reveal patterns. We can see exactly where agents struggle and refine our global instructions for the entire org. This shift also requires a new mechanism for evaluating engineers; we now measure our people by their ability to use and tune these AI systems. In this new era, an engineer who refuses to adapt to the orchestrator model or fails to master the governing of these agents becomes a bottleneck; quite frankly, they have to go.
Our current practice is a strict sequence: Plan first, capture it, delegate it, execute. It turns out we aren’t just managing code anymore; we are managing the behavior that generates it. The real question isn’t whether the AI wrote “good” code; it’s whether our governance allows bad assumptions to reach production. Once you solve for the protocol, the code takes care of itself.