Consent doesn't stop at training: data and attribution in the agentic era
Most of the data-and-AI debate is still about training: who was scraped, who gets paid. While that argument runs, the ground has moved. Agents now read your live data, act on your behalf, and pass context to other agents. That changes what consent has to cover, and it changes attribution from a statistics problem into an accounting one.
We spent a lot of words last time on why it is so hard to trace a model’s output back to the data baked into its weights. The short version: during pre-training, influence is diffuse, the geometry fights you, and exact per-output attribution is mostly a fantasy.
Here is the thing almost no one is saying out loud. That problem, the hard one, is about a world that is already half gone. The frontier is not a model trained once on a frozen corpus and then shipped. The frontier is an agent: a model wired to tools, reading live data, taking actions, calling other agents, and carrying memory between sessions. And in that world, the data question is different, the consent question is different, and oddly enough, attribution gets easier in exactly the place training made it hard.
The shift in one sentence
Training-era consent asked, “may you learn from my data?” Agentic consent has to ask, “may you read this, use it, and act on it right now, for this purpose, within these limits, and who is accountable when you do?”
That is a different shape of question. It is about the present tense, not the past. It is scoped to a task, not a corpus. And it is revocable in a way that a trained weight never really is, because the agent has to come back and ask again the next time it acts.
Where the data actually flows now
When people picture AI and data, they picture the scrape: a crawler hoovering up the open web into a training set. Agents move data in ways that picture misses entirely.
An agent reads your live context. It opens your documents, your email, your calendar, your codebase, through tool calls and connectors. None of that is training data. It never touches the weights. It flows into the context window, shapes one set of actions, and is gone. The influence is real and immediate, and it is also, crucially, observable, because it happened in a request you can log rather than in a gradient update you cannot.
An agent takes actions. It sends the email, books the flight, files the ticket, moves the money, merges the pull request. The output is no longer just text on a screen. It is an effect in the world, and effects in the world have consequences and need accountability in a way that a paragraph does not.
An agent talks to other agents. It delegates a subtask, hands off context, pulls a memory another agent wrote. Your data crosses not just system boundaries but agent boundaries, and the chain of who-told-whom is where responsibility either gets tracked or gets lost.
Each of these is a consent surface, and none of them is covered by “did you agree to be in the training set.”
Attribution flips
Here is the part that should change how people think about this. The closer data sits to the moment of use, the more attributable it becomes. We made that point about retrieval and in-context use last time. Agents live entirely at that end of the spectrum.
An agent’s behavior is, in principle, fully traceable. There is a prompt. There is a set of tool calls with inputs and outputs. There is a context window with known contents. There is a sequence of actions with timestamps. If you instrument the agent, you do not have to estimate which data influenced which action with an inverse Hessian and a prayer. You can read it off the trace.
This is a genuine inversion. Pre-training attribution is hard because the causal path is smeared across billions of parameters. Agentic attribution is tractable because the causal path is a log. The catch is that the log only exists if the protocol requires it to exist. Attribution in the agentic era is not a research problem. It is an instrumentation mandate, and standards are how mandates get enforced across vendors who would otherwise each log whatever they felt like, or nothing.
That is why LCS-003, the agent permission standard, is built around capabilities and an audit trail rather than vague trust. An agent should carry an explicit, bounded grant: these actions, this spending limit, this rate, these domains, expiring then, revocable now, and every action it takes against that grant should be recorded. The point is not bureaucracy. The point is that “the agent did something with my data” should always have an answer to “what, exactly, and on whose authority.”
The new hard problems
Making attribution tractable does not make the agentic world simple. It trades one set of hard problems for another, and the new ones are about accountability across boundaries rather than statistics inside a model.
Delegation. Agent A is allowed to act for you. It delegates to agent B, which delegates to a tool run by a third party. Whose consent governs B’s action? Does your grant flow down the chain, and if so, with what limits? When B does something wrong, the audit trail needs to walk back up to the authority that permitted it. Delegation without traceable chains is just laundering responsibility, and it is the default unless a standard says otherwise. This is the part of LCS-003 we argue about the most.
Shared memory. Agents are more useful when they remember, and more useful still when they can share what they remember so you stop re-explaining yourself to every new tool. But a memory has provenance. Who created it, from whose data, under what consent, with what time to live. LCS-004 treats memory as something with an owner and an access policy rather than a free-floating fact, precisely so that consent and attribution survive the handoff between agents. A memory pool with no provenance is a quiet way to strip consent off data by passing it around until nobody remembers where it came from.
Your standing model. As agents get more personal, there is pressure to build a persistent representation of you that any of them can consult. Done badly, that is every platform secretly assembling its own profile. Done well, it is a thing you own and grant access to deliberately. That is the bet behind LCS-002, the digital twin: one representation, owned by the person, with tiered and revocable access, instead of a dozen unaccountable shadow profiles.
Compensation gets more natural, not less
There is a quiet upside in all of this for the compensation question that training made so painful.
When influence is diffuse across weights, paying for it fairly is close to impossible, which is most of why nobody has. When data is used by an agent at a specific moment for a specific action, the usage is metered almost for free. There was a call. It read these inputs. It happened at this time. That is a billable event in a way that “your text marginally shifted a few million parameters” never was.
So the economics that felt hand-wavy in the training context become ordinary in the agentic one. Per-use, per-action settlement against the terms in a consent token is just invoicing. LCS-001 already carries the rate fields for it. The agentic era is where they start to mean something concrete rather than aspirational, because the meter finally exists.
This needs to ride alongside the tools, not behind them
Agents are standardizing how they reach the world. Tool and context protocols are becoming shared infrastructure, which is good, it is how the agent ecosystem avoids fragmenting into a hundred incompatible connectors. But a tooling standard answers “how does the agent call this?” It does not answer “was the agent allowed to, on whose authority, within what limits, and is it logged?”
That second question is the one a consent layer answers, and it has to sit right next to the tool call, not in a terms-of-service document nobody reads. Permission checked at the moment of action. Limits enforced by the runtime. The action recorded against the grant. When consent lives beside the tool call, attribution and accountability come almost for free, because the trace is already there. When it lives somewhere else, you are back to trusting that everyone behaves, which is how we got here.
Where this leaves us
The training-data fight will keep going, and it matters. But planning the entire future of AI consent around it is like writing careful rules for letters while everyone moves to live calls. The action has moved to agents that read, act, and remember in real time, and that move is good news in one specific way: it puts data use back at the surface, where it can be seen, scoped, logged, and paid for.
The harder, more interesting problems are now about accountability across delegation chains, provenance across shared memory, and who is answerable when an autonomous system acts. Those are exactly the problems LCS-002, LCS-003, and LCS-004 exist to take on, and they are the least finished parts of the standard, which is to say the parts where showing up early counts for the most.
If agentic systems are what you build or worry about, this is the work. The drafts are on GitHub, and the argument is open at the contribute page.
Want the next one in your inbox? Occasional updates, no spam.