<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://llmconsent.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://llmconsent.org/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-06-28T04:08:48+00:00</updated><id>https://llmconsent.org/feed.xml</id><title type="html">LLMConsent</title><subtitle>LLMConsent is an open, decentralized protocol for managing consent between humans and AI systems: verifiable permissions, attribution, and fair compensation for the data and identities that train and run AI.</subtitle><author><name>Subhadip Mitra</name></author><entry><title type="html">Consent doesn’t stop at training: data and attribution in the agentic era</title><link href="https://llmconsent.org/blog/2026/consent-in-the-agentic-era/" rel="alternate" type="text/html" title="Consent doesn’t stop at training: data and attribution in the agentic era" /><published>2026-06-28T02:00:00+00:00</published><updated>2026-06-28T02:00:00+00:00</updated><id>https://llmconsent.org/blog/2026/consent-in-the-agentic-era</id><content type="html" xml:base="https://llmconsent.org/blog/2026/consent-in-the-agentic-era/"><![CDATA[<p>We spent a lot of words <a href="/blog/2025/the-attribution-problem/">last time</a> on why
it is so hard to trace a model’s output back to the data baked into its weights.
The short version: during pre-training, influence is diffuse, the geometry fights
you, and exact per-output attribution is mostly a fantasy.</p>

<p>Here is the thing almost no one is saying out loud. That problem, the hard one, is
about a world that is already half gone. The frontier is not a model trained once
on a frozen corpus and then shipped. The frontier is an agent: a model wired to
tools, reading live data, taking actions, calling other agents, and carrying
memory between sessions. And in that world, the data question is different, the
consent question is different, and oddly enough, attribution gets easier in
exactly the place training made it hard.</p>

<h2 id="the-shift-in-one-sentence">The shift in one sentence</h2>

<p>Training-era consent asked, “may you learn from my data?” Agentic consent has to
ask, “may you read this, use it, and act on it right now, for this purpose, within
these limits, and who is accountable when you do?”</p>

<p>That is a different shape of question. It is about the present tense, not the past.
It is scoped to a task, not a corpus. And it is revocable in a way that a trained
weight never really is, because the agent has to come back and ask again the next
time it acts.</p>

<h2 id="where-the-data-actually-flows-now">Where the data actually flows now</h2>

<p>When people picture AI and data, they picture the scrape: a crawler hoovering up
the open web into a training set. Agents move data in ways that picture misses
entirely.</p>

<p>An agent reads your live context. It opens your documents, your email, your
calendar, your codebase, through tool calls and connectors. None of that is
training data. It never touches the weights. It flows into the context window,
shapes one set of actions, and is gone. The influence is real and immediate, and
it is also, crucially, observable, because it happened in a request you can log
rather than in a gradient update you cannot.</p>

<p>An agent takes actions. It sends the email, books the flight, files the ticket,
moves the money, merges the pull request. The output is no longer just text on a
screen. It is an effect in the world, and effects in the world have
consequences and need accountability in a way that a paragraph does not.</p>

<p>An agent talks to other agents. It delegates a subtask, hands off context, pulls
a memory another agent wrote. Your data crosses not just system boundaries but
agent boundaries, and the chain of who-told-whom is where responsibility either
gets tracked or gets lost.</p>

<p>Each of these is a consent surface, and none of them is covered by “did you agree
to be in the training set.”</p>

<h2 id="attribution-flips">Attribution flips</h2>

<p>Here is the part that should change how people think about this. The closer data
sits to the moment of use, the more attributable it becomes. We made that point
about retrieval and in-context use last time. Agents live entirely at that end of
the spectrum.</p>

<p>An agent’s behavior is, in principle, fully traceable. There is a prompt. There is
a set of tool calls with inputs and outputs. There is a context window with known
contents. There is a sequence of actions with timestamps. If you instrument the
agent, you do not have to estimate which data influenced which action with an
inverse Hessian and a prayer. You can read it off the trace.</p>

<p>This is a genuine inversion. Pre-training attribution is hard because the causal
path is smeared across billions of parameters. Agentic attribution is tractable
because the causal path is a log. The catch is that the log only exists if the
protocol requires it to exist. Attribution in the agentic era is not a research
problem. It is an instrumentation mandate, and standards are how mandates get
enforced across vendors who would otherwise each log whatever they felt like, or
nothing.</p>

<p>That is why <a href="/standards/lcs-003/">LCS-003</a>, the agent permission standard, is
built around capabilities and an audit trail rather than vague trust. An agent
should carry an explicit, bounded grant: these actions, this spending limit, this
rate, these domains, expiring then, revocable now, and every action it takes
against that grant should be recorded. The point is not bureaucracy. The point is
that “the agent did something with my data” should always have an answer to “what,
exactly, and on whose authority.”</p>

<h2 id="the-new-hard-problems">The new hard problems</h2>

<p>Making attribution tractable does not make the agentic world simple. It trades one
set of hard problems for another, and the new ones are about accountability across
boundaries rather than statistics inside a model.</p>

<p><strong>Delegation.</strong> Agent A is allowed to act for you. It delegates to agent B, which
delegates to a tool run by a third party. Whose consent governs B’s action? Does
your grant flow down the chain, and if so, with what limits? When B does something
wrong, the audit trail needs to walk back up to the authority that permitted it.
Delegation without traceable chains is just laundering responsibility, and it is
the default unless a standard says otherwise. This is the part of LCS-003 we argue
about the most.</p>

<p><strong>Shared memory.</strong> Agents are more useful when they remember, and more useful
still when they can share what they remember so you stop re-explaining yourself to
every new tool. But a memory has provenance. Who created it, from whose data, under
what consent, with what time to live. <a href="/standards/lcs-004/">LCS-004</a> treats memory
as something with an owner and an access policy rather than a free-floating fact,
precisely so that consent and attribution survive the handoff between agents. A
memory pool with no provenance is a quiet way to strip consent off data by passing
it around until nobody remembers where it came from.</p>

<p><strong>Your standing model.</strong> As agents get more personal, there is pressure to build a
persistent representation of you that any of them can consult. Done badly, that is
every platform secretly assembling its own profile. Done well, it is a thing you
own and grant access to deliberately. That is the bet behind
<a href="/standards/lcs-002/">LCS-002</a>, the digital twin: one representation, owned by the
person, with tiered and revocable access, instead of a dozen unaccountable shadow
profiles.</p>

<h2 id="compensation-gets-more-natural-not-less">Compensation gets more natural, not less</h2>

<p>There is a quiet upside in all of this for the compensation question that training
made so painful.</p>

<p>When influence is diffuse across weights, paying for it fairly is close to
impossible, which is most of why nobody has. When data is used by an agent at a
specific moment for a specific action, the usage is metered almost for free. There
was a call. It read these inputs. It happened at this time. That is a billable
event in a way that “your text marginally shifted a few million parameters” never
was.</p>

<p>So the economics that felt hand-wavy in the training context become ordinary in
the agentic one. Per-use, per-action settlement against the terms in a consent
token is just invoicing. <a href="/standards/lcs-001/">LCS-001</a> already carries the rate
fields for it. The agentic era is where they start to mean something concrete
rather than aspirational, because the meter finally exists.</p>

<h2 id="this-needs-to-ride-alongside-the-tools-not-behind-them">This needs to ride alongside the tools, not behind them</h2>

<p>Agents are standardizing how they reach the world. Tool and context protocols are
becoming shared infrastructure, which is good, it is how the agent ecosystem
avoids fragmenting into a hundred incompatible connectors. But a tooling standard
answers “how does the agent call this?” It does not answer “was the agent allowed
to, on whose authority, within what limits, and is it logged?”</p>

<p>That second question is the one a consent layer answers, and it has to sit right
next to the tool call, not in a terms-of-service document nobody reads. Permission
checked at the moment of action. Limits enforced by the runtime. The action
recorded against the grant. When consent lives beside the tool call, attribution
and accountability come almost for free, because the trace is already there. When
it lives somewhere else, you are back to trusting that everyone behaves, which is
how we got here.</p>

<h2 id="where-this-leaves-us">Where this leaves us</h2>

<p>The training-data fight will keep going, and it matters. But planning the entire
future of AI consent around it is like writing careful rules for letters while
everyone moves to live calls. The action has moved to agents that read, act, and
remember in real time, and that move is good news in one specific way: it puts
data use back at the surface, where it can be seen, scoped, logged, and paid for.</p>

<p>The harder, more interesting problems are now about accountability across
delegation chains, provenance across shared memory, and who is answerable when an
autonomous system acts. Those are exactly the problems LCS-002, LCS-003, and
LCS-004 exist to take on, and they are the least finished parts of the standard,
which is to say the parts where showing up early counts for the most.</p>

<p>If agentic systems are what you build or worry about, this is the work. The drafts
are on <a href="https://github.com/LLMConsent/llmconsent-standards">GitHub</a>, and the argument is open at
<a href="/contribute/">the contribute page</a>.</p>]]></content><author><name>Subhadip Mitra</name></author><category term="agents" /><category term="attribution" /><summary type="html"><![CDATA[We spent a lot of words last time on why it is so hard to trace a model’s output back to the data baked into its weights. The short version: during pre-training, influence is diffuse, the geometry fights you, and exact per-output attribution is mostly a fantasy.]]></summary></entry><entry><title type="html">The attribution problem: tracing an output back to your data</title><link href="https://llmconsent.org/blog/2025/the-attribution-problem/" rel="alternate" type="text/html" title="The attribution problem: tracing an output back to your data" /><published>2025-12-28T00:00:00+00:00</published><updated>2025-12-28T00:00:00+00:00</updated><id>https://llmconsent.org/blog/2025/the-attribution-problem</id><content type="html" xml:base="https://llmconsent.org/blog/2025/the-attribution-problem/"><![CDATA[<p>If you want to compensate people when their data is used to build a model, you
eventually run into a question that sounds simple and is not: which data was
used, and how much did it matter? Call it the attribution problem. Most public
debate skips past it, because it is easier to argue about whether scraping is
fair than to admit that even if everyone agreed it should be paid for, we would
not currently know how to divide the money.</p>

<p>This post is an attempt to lay the problem out plainly, separate the parts that
are genuinely hard from the parts that just sound hard, and explain the stance
LLMConsent takes as a result. It gets technical. That is on purpose. A consent
protocol that hand-waves over attribution is selling something.</p>

<h2 id="two-questions-that-get-confused">Two questions that get confused</h2>

<p>The first thing to do is split “was my data used” into two very different
questions, because they have very different answers.</p>

<p><strong>Provenance</strong>, or membership, asks: was this specific data point part of the
training set? This is a question about a fact in the past. Did the file enter the
pipeline.</p>

<p><strong>Influence</strong>, or attribution proper, asks: how much did this data point shape
the model’s parameters, and through them, a particular output? This is a question
about cause and effect inside a system with hundreds of billions of moving parts.</p>

<p>These get blurred together constantly, and it matters, because provenance is
mostly tractable and influence mostly is not. If you can keep them separate, a
lot of the confusion clears up.</p>

<h2 id="provenance-is-the-easy-half">Provenance is the easy half</h2>

<p>If you control the training pipeline, you know what went in. You can hash every
document at ingestion, commit those hashes to a log, and later prove that a given
file was or was not part of a given training run. Cryptographically this is
ordinary. Merkle trees and signed manifests have done this for decades in other
contexts. There is no deep research problem in proving “this document was in
corpus version 7.”</p>

<p>The harder version of provenance is when you do not control the pipeline and want
to find out after the fact whether your data was in someone’s model. That is
membership inference, and it is a statistical attack rather than a clean proof.
You probe the model, look at how confidently it predicts your text, and infer
membership from the gap between how it treats data it has seen and data it has
not. It works better on outliers and memorized content, worse on ordinary text
that looks like everything else. It gives you a probability, not a receipt.</p>

<p>For a consent protocol the lesson is to lean on the clean version. If consent is
checked and recorded at the moment data enters a pipeline, provenance stops being
a forensic question and becomes a logged fact. You do not have to reverse engineer
membership from the outside if permission was verified on the way in.</p>

<h2 id="influence-is-the-hard-half">Influence is the hard half</h2>

<p>Now the difficult question. A model produced a sentence. Your blog post was
somewhere in the trillions of tokens it trained on. How much credit does your
post deserve for that sentence?</p>

<p>The honest baseline answer is that there is a precise definition of influence and
it is almost never computable. The definition is counterfactual: the influence of
your data point is the difference between the model you got and the model you
would have gotten if that point had been left out. Train with it, train without
it, compare. This is the leave-one-out ideal, and it is the thing every practical
method is trying to approximate, because actually doing it means retraining a
frontier model once per data point. Nobody is retraining a model a trillion times
to settle a royalty.</p>

<p>So the field builds estimators. A few worth knowing by name:</p>

<ul>
  <li>
    <p><strong>Influence functions</strong>, introduced to modern machine learning by Koh and Liang in 2017, estimate the leave-one-out effect without retraining, using the model’s gradients and a term involving the inverse Hessian of the loss. The math is clean. The cost is brutal, because that Hessian is the size of the model squared, and you have to approximate it hard to run it on anything large. Anthropic’s 2023 work scaled influence functions to large language models using an approximation called EK-FAC, and one of their findings is instructive: the sequences that influence a given output are often related by theme and reasoning pattern rather than by surface wording. Influence is real, but it is diffuse and a little alien. It does not point at one source.</p>
  </li>
  <li>
    <p><strong>TracIn</strong> takes a different route, tracing influence by following the dot product of gradients across training checkpoints. Intuitively, a training example influenced an output if updating on it would have reduced the loss on that output. Cheaper in some setups, still approximate.</p>
  </li>
  <li>
    <p><strong>Datamodels</strong>, from Ilyas and collaborators in 2022, fit a surrogate that predicts a model’s behavior as a function of which training examples were included. It treats the model as something to be regressed against its own training set. Powerful, and also expensive to construct, because you train many models on many subsets to fit the surrogate.</p>
  </li>
</ul>

<p>Every one of these is an approximation of the counterfactual, each with its own
failure modes, and none of them is cheap enough to run per output per data point
at the scale of a real product. That is not a temporary engineering gap. The
geometry of how these models store information is working against you.</p>

<h2 id="why-the-geometry-fights-back">Why the geometry fights back</h2>

<p>A neural network does not file your blog post in a drawer labeled “your blog
post.” Training distributes what it learns across the weights. A single concept is
spread over many neurons, and a single neuron participates in many concepts. The
interpretability literature calls the first property distributed representation
and the second polysemanticity, and recent work on superposition shows models
deliberately pack more features than they have dimensions, overlapping them
because they can usually get away with it.</p>

<p>The practical consequence is that there is no spot in the weights you can point to
and say “that came from this person.” The information is smeared. Your data nudged
millions of parameters by tiny amounts, and so did everyone else’s, and the
output you care about is the joint result of all those nudges passed through a
deeply nonlinear function. Asking for one source’s share of one output is a bit
like asking which raindrop is responsible for a particular eddy in a river.</p>

<h2 id="the-exception-memorization">The exception: memorization</h2>

<p>There is one regime where attribution gets dramatically easier, and it is the one
that makes headlines. Sometimes a model does not generalize a piece of data, it
memorizes it, and can reproduce it close to verbatim. Work by Carlini and others
has shown you can extract memorized training data from large models, that larger
models memorize more, and that the single biggest driver of memorization is
duplication. Data that appears many times in the corpus is far more likely to be
regurgitated, which is also why deduplicating training data measurably reduces
memorization.</p>

<p>Memorized content is attributable almost by definition. If the model emits your
text verbatim, the link is not statistical, it is visible. This is the easy and
legally loud case. But it is the minority case. Most of what a model does is
generalization, recombination, the diffuse kind of influence that the methods
above can only estimate. Building an entire compensation regime on the memorized
tail would miss most of how these systems actually use data.</p>

<h2 id="after-pre-training-the-ground-shifts">After pre-training, the ground shifts</h2>

<p>Everything so far is about pre-training, the giant first pass over the open
corpus. The picture changes, somewhat for the better, in the stages that come
after.</p>

<p><strong>Fine-tuning</strong> runs on smaller, curated datasets. The counterfactual is still
the right definition and still not free, but the numbers are friendlier. With
thousands or millions of examples instead of trillions, influence estimates are
more stable and the set of candidate sources is bounded. Attribution here is
hard but no longer hopeless.</p>

<p><strong>Reinforcement learning from human feedback</strong> is harder again in a different way.
The training signal is preference data, humans choosing between outputs, and that
signal shapes behavior globally rather than injecting retrievable facts. Tracing a
specific model behavior back to a specific preference label is murky, and the
labelers themselves are a data source whose contribution almost no one accounts
for.</p>

<p><strong>Retrieval augmented generation</strong> is the pleasant surprise. When a model answers
by pulling documents into its context at run time, the documents it used are not a
mystery. They are right there in the request. Attribution becomes logging. This is
why citations in retrieval systems are tractable while citations for pre-training
knowledge are not. If you want clean, per-use attribution today, retrieval is
where it already exists.</p>

<p><strong>In-context use</strong> is the trivial case. Whatever you put in the prompt, the model
saw, and you know exactly what you put there. That is not really training, but it
is increasingly how data reaches models in practice, and it points at where this
is all heading.</p>

<p>Notice the pattern. The closer data sits to the moment of use, the more
attributable it becomes. The deeper it is baked into the weights, the less. A
consent system should take that seriously instead of pretending one mechanism
fits all of it.</p>

<h2 id="what-llmconsent-does-about-it">What LLMConsent does about it</h2>

<p>Given all of the above, designing the protocol around perfect per-output
attribution would be designing around something that does not exist. So we do not.</p>

<p>We separate the two questions on purpose. Provenance is treated as a verifiable
fact, established when consent is checked at ingestion, not reconstructed later by
attack. <a href="/standards/lcs-001/">LCS-001</a> is built around that grant and check
moment, so the record of what was permitted exists before any training happens,
not as an afterthought.</p>

<p>For influence, we bound rather than measure. A consent token carries a maximum
influence ceiling, expressed in basis points, that caps how much any single
source is allowed to shape a model. Capping influence is far more achievable than
measuring it exactly after the fact, and it turns an impossible accounting problem
into a tractable policy one. You do not need to know that a source contributed
0.0007 of an output if the terms already said it may contribute at most a set
fraction.</p>

<p>Where attribution genuinely is available, retrieval, memorization, fine-tuning
sets, the protocol expects implementations to use it, and the economics can be
exact in those cases. Where it is not, the model is consent at ingestion plus
bounded influence plus usage-based settlement, rather than a fantasy of tracing
every token to a payee. We would rather ship a mechanism that is honest about its
approximations than a precise-sounding one that quietly cannot be implemented.</p>

<p>There is also revocation, which drags in another genuinely open problem: machine
unlearning. Removing the effect of a data point from an already-trained model is
its own research frontier, with approaches that range from full retraining of
data shards to approximate methods with real limits. The token in LCS-001 can
signal that consent is revocable and that unlearning is requested, but we are not
going to pretend the ecosystem can perfectly forget on command yet. It cannot.
Naming the gap in the standard is better than hiding it.</p>

<h2 id="the-honest-summary">The honest summary</h2>

<p>Provenance is a solved problem if you record consent at the right moment.
Influence during pre-training is, for now, fundamentally approximate, and the
shape of these models suggests it will stay that way. Attribution improves the
closer you get to the point of use, which is exactly where AI is moving with
retrieval and agents. A serious consent protocol should prove what it can prove,
bound what it cannot measure, and refuse to dress up estimates as receipts.</p>

<p>If you work on training data attribution, influence estimation, or unlearning, we
would genuinely like to be wrong about the hard parts. The standards are on
<a href="https://github.com/LLMConsent/llmconsent-standards">GitHub</a>, and the place to argue is
<a href="/contribute/">an issue or a proposal</a>.</p>]]></content><author><name>Subhadip Mitra</name></author><category term="attribution" /><category term="research" /><summary type="html"><![CDATA[If you want to compensate people when their data is used to build a model, you eventually run into a question that sounds simple and is not: which data was used, and how much did it matter? Call it the attribution problem. Most public debate skips past it, because it is easier to argue about whether scraping is fair than to admit that even if everyone agreed it should be paid for, we would not currently know how to divide the money.]]></summary></entry><entry><title type="html">Introducing LLMConsent</title><link href="https://llmconsent.org/blog/2025/introducing-llmconsent/" rel="alternate" type="text/html" title="Introducing LLMConsent" /><published>2025-09-15T00:00:00+00:00</published><updated>2025-09-15T00:00:00+00:00</updated><id>https://llmconsent.org/blog/2025/introducing-llmconsent</id><content type="html" xml:base="https://llmconsent.org/blog/2025/introducing-llmconsent/"><![CDATA[<p>For the last two years the conversation about AI and data has been stuck in the
same loop. A model ships. Someone notices their work is in it. A lawsuit gets
filed, or a company adds an opt-out form that almost nobody finds, and everyone
moves on until the next model ships. Nothing about the underlying machinery
changes, because there is no underlying machinery. There is no agreed way for a
person to say “yes, you can use this, under these terms” and for a system to
check that answer before it acts.</p>

<p>That gap is what we are trying to close. Today we are releasing the first draft
of <strong>LLMConsent</strong>, an open protocol for consent between humans and AI systems.</p>

<h2 id="why-this-is-a-protocol-not-a-product">Why this is a protocol, not a product</h2>

<p>The instinct most people have is to build a product. A consent dashboard, a
licensing marketplace, a “data union” with a slick app. We think that instinct
is wrong, or at least incomplete, and it is worth saying why up front.</p>

<p>A product governs whoever uses that product. If one company builds the best
consent tool in the world, it still only covers the data flowing through that
one company. The moment your data crosses into a different model, a different
vendor, a different jurisdiction, the rules reset to zero. Consent that cannot
travel with the data is not really consent. It is a setting on someone else’s
server.</p>

<p>The things that actually solved this class of problem on the internet were not
products. They were agreements. TCP/IP is an agreement about how to move packets.
HTTP is an agreement about how to move documents. TLS is an agreement about how
to do it without everyone watching. None of them are owned. Their value came
precisely from the fact that everyone could implement them and nobody could
revoke them.</p>

<p>AI is missing an agreement of that kind, and the missing one is about consent.
Who is allowed to use what, for which purpose, for how long, and at what price.
LLMConsent is our proposal for that agreement.</p>

<h2 id="what-it-actually-is">What it actually is</h2>

<p>LLMConsent is a set of open standards. The core of it is a consent token: a
signed, checkable statement that scopes how a piece of data or a person’s digital
representation may be used. Train on it or do not. Run inference with it or do
not. Let an agent act on it or do not. Bound how much any single source can
influence a model. Set an expiry. Attach a price. Revoke it later if you change
your mind.</p>

<p>There are four core standards in this first release.</p>

<ul>
  <li><strong><a href="/standards/lcs-001/">LCS-001</a></strong> defines the consent token itself and the grant, check, and revoke lifecycle that everything else is built on.</li>
  <li><strong><a href="/standards/lcs-002/">LCS-002</a></strong> describes a digital twin, a persistent, user owned model that AI systems can reference with permission instead of each one rebuilding a private profile of you from scratch.</li>
  <li><strong><a href="/standards/lcs-003/">LCS-003</a></strong> covers agent permissions: what an autonomous agent is allowed to do on your behalf, with spending limits, rate limits, delegation rules, and an audit trail.</li>
  <li><strong><a href="/standards/lcs-004/">LCS-004</a></strong> handles memory shared across agents, so context can follow you between systems without you losing control of what gets remembered.</li>
</ul>

<p>They build on each other in that order. You can read all four today. They are
drafts, and they will change, and that is the point.</p>

<h2 id="what-it-is-not">What it is not</h2>

<p>Because of the company this protocol keeps, a few clarifications are worth making
plainly.</p>

<p>This is not a token, a coin, or an investment of any kind. There is nothing to
buy. The protocol uses cryptography because consent needs to be verifiable and
signatures are how you verify things, but the goal is consent management, not
speculation.</p>

<p>This is not a single company’s API with an open-source sticker on it. There are
no admin keys. No one, including us, can freeze, seize, or quietly rewrite
someone’s consent. If the project succeeds, it should outlive any of its current
maintainers, the same way HTTP outlived the people who first wrote it down.</p>

<p>And it is not finished. The standards are early. The reference SDKs are mostly
unwritten. We are publishing now, in this state, on purpose, because a standard
written behind closed doors and revealed as a finished thing is not a standard.
It is a press release. We would rather get the hard questions early.</p>

<h2 id="on-licensing">On licensing</h2>

<p>The code is under MIT. The standards and documentation are under CC BY 4.0. The
split is deliberate. Implementations should be as easy to build on as possible,
and specifications should be free to quote, translate, and fork as long as the
attribution stays intact. Nobody should have to ask us for permission to
implement a protocol whose entire purpose is permission.</p>

<h2 id="how-it-is-governed">How it is governed</h2>

<p>The process is modeled on the ones that built the open internet: the IETF’s RFC
process, the W3C, and the BIP and EIP improvement-proposal traditions. Anyone can
propose a standard. Proposals are reviewed in public. Things move forward by
rough consensus and working code, not by a vote we control. Accepted standards
become immutable, and changing one means writing a new proposal that supersedes
it, with the reasoning on the record.</p>

<p>We are in the bootstrap phase, and we are honest about what that means. Right now
there is a small group doing the early work, and a founding maintainer with a
tie-breaking vote that is meant to sunset. The plan is to give that power away as
the community grows, not to accumulate it.</p>

<h2 id="what-we-are-asking-for">What we are asking for</h2>

<p>If you have read this far, you are probably the kind of person we need.</p>

<p>Read <a href="/standards/lcs-001/">LCS-001</a> and tell us where it is wrong. Try to
implement a piece of it and tell us where the spec is ambiguous or impossible.
Build an SDK in a language we have not covered. Open a proposal for the standard
we forgot. Argue with us in the open.</p>

<p>The questions in front of us are genuinely unsolved. How do you attribute a
model’s behavior back to the data that shaped it, and how much of that is even
possible? How should compensation work when influence is diffuse? What does
consent mean for an agent that acts in real time rather than a model trained once?
We do not have clean answers to all of these. We have a structure to work them out
in, and an insistence that the work happen in public.</p>

<p>This is the ground floor. Come build it with us.</p>

<p>The specs are on <a href="https://github.com/LLMConsent/llmconsent-standards">GitHub</a>. Conversation
happens on <a href="https://discord.gg/c2tjrZKcbR">Discord</a>. And if you just want to follow
along, the email list is at the bottom of this page.</p>]]></content><author><name>Subhadip Mitra</name></author><category term="announcement" /><summary type="html"><![CDATA[For the last two years the conversation about AI and data has been stuck in the same loop. A model ships. Someone notices their work is in it. A lawsuit gets filed, or a company adds an opt-out form that almost nobody finds, and everyone moves on until the next model ships. Nothing about the underlying machinery changes, because there is no underlying machinery. There is no agreed way for a person to say “yes, you can use this, under these terms” and for a system to check that answer before it acts.]]></summary></entry></feed>