How the Grounded-AI Pledge is enforced in code
The Pledge says every drafted answer links to a source in your KB. Here's how the drafting engine enforces that — with refusals, not with model hygiene.
The Grounded-AI Pledge says that every answer PursuitAgent drafts links to the exact source document and page in your knowledge base. It is a contractual commitment — we publish the clause, we cap our own ability to change it, and a customer can terminate without penalty if we do.
Customers ask, reasonably, how do you actually enforce that. The answer is: we enforce it as a refusal at draft time, not as a post-hoc check.
This post walks through the mechanism.
The problem we’re not solving
Stanford HAI’s 2024 study of commercial legal RAG tools — Lexis+ AI, Westlaw AI, Ask Practical Law — found hallucination rates between 17% and 33% even with retrieval. Citations were present in the outputs. The citations often didn’t support the claims attached to them. In RAG literature, this is known as the source-attribution vs. claim-support gap: the model cites a passage, but the sentence it cites doesn’t actually back up the sentence it generated.
We are not solving that gap with better prompting. Better prompting reduces the rate, but it does not close it. The only way to close it is to make the generator refuse to produce a sentence that isn’t demonstrably grounded in retrieved content.
That is what the drafting engine does.
The engine’s contract
The drafting engine accepts a question (from an RFP) and a set of retrieved candidate blocks from the customer’s KB. It returns either a draft sentence or a refusal.
A draft sentence has three invariants:
- It is accompanied by a pointer to a specific KB block (by block ID, version, document ID, and page).
- Every substantive noun phrase in the sentence has a provenance trace to a span inside that block.
- The block’s content, when read independently, supports the sentence.
The third is the hard one. (1) and (2) are bookkeeping. (3) is where most grounded-AI systems fail: they give you the pointer without verifying the pointer supports the generated text.
Where we enforce
Three places.
Stage 1 — Retrieval confidence gate
The retrieval step returns blocks with a relevance score. We gate on a floor. If no block clears the floor, the engine doesn’t try to draft — it returns UNGROUNDED with a list of candidate blocks that were close but not close enough. A human operator sees exactly what the engine saw and why it stopped.
export type DraftResult =
| { kind: "drafted"; sentence: string; blockId: string; blockVersion: string; pageRef: PageRef }
| { kind: "ungrounded"; candidates: CandidateBlock[]; reason: UngroundedReason };
const RETRIEVAL_FLOOR = 0.62; // tuned on our held-out RFP corpus
export async function draftAnswer(
question: string,
candidates: CandidateBlock[],
): Promise<DraftResult> {
const top = candidates.filter((c) => c.score >= RETRIEVAL_FLOOR);
if (top.length === 0) {
return {
kind: "ungrounded",
candidates,
reason: "retrieval-floor-not-met",
};
}
// ...continue to the drafting step
}
Stage 2 — Constrained draft
With one or more qualifying blocks, the engine drafts. The draft step is a Claude-family call with a system prompt that is not a typical “answer the question helpfully” prompt. It is a rewrite prompt:
You are given a QUESTION and a SOURCE block. Rewrite the SOURCE into a sentence that answers the QUESTION. Do not introduce facts that are not in the SOURCE. If the SOURCE does not contain enough information to answer the QUESTION, respond with the literal string
REFUSE.
Rewrite prompts fabricate far less than generation prompts. They are not a silver bullet — the model can still add small embellishments — but the failure mode is “paraphrased too aggressively” instead of “invented a number.”
Stage 3 — Verification
After the draft returns, we run a verification step that is independent of the draft model. For each substantive noun phrase in the drafted sentence, we ask: does a span in the SOURCE block entail this phrase? If any phrase fails, the whole sentence is refused.
const phrases = extractNounPhrases(draftedSentence);
for (const phrase of phrases) {
const entailed = await verifier.entails(phrase, sourceBlock.text);
if (!entailed) {
return { kind: "ungrounded", candidates, reason: "entailment-failure" };
}
}
The verifier is a smaller model tuned for entailment. It is cheap. It runs on every sentence the drafting engine emits. A customer question that retrieves well but whose best answer doesn’t fully entail can still refuse at this stage.
What the refusal looks like in the UI
When the engine refuses, the drafting UI doesn’t leave the reviewer staring at an empty textarea. It surfaces the candidate blocks with their scores, the reason for the refusal, and a suggested next step (add a KB block, escalate to SME, accept a partial answer that entails on the portion the SOURCE supports).
An operator can override. Overrides are logged, and the log surfaces in the Review tab when a reviewer approves the response. The Pledge does not promise a system with no human override. It promises a system that never ships a claim labeled as grounded when the claim isn’t.
What breaks this
Three things we are still working on.
Numeric facts. Entailment for “we have 99.9% uptime” against a block that says “99.94%” can pass or fail depending on how loose the verifier is tuned. We err on the strict side: numeric mismatches fail. This is noisy and sometimes requires the human to notice that the SOURCE has the more precise number.
Compound claims. A drafted sentence can state two things, one of which is entailed by one block and the other by another block. The current stack doesn’t compose entailment across blocks. For high-tier pillars we’re running multi-block entailment in a research branch; it’s not in the production path yet.
Synonyms and tense. “We support SOC 2 Type II” vs. “SOC 2 Type II compliance is maintained annually.” The verifier handles common paraphrase, but corner cases slip. These are the failures we actually want customers to flag.
We will write a follow-up on each. The engineering log is a place where “here’s what’s still broken” is a feature, not a failure.
The short version
The Pledge is enforced at draft time, not at review time. It is enforced by three gates — retrieval floor, rewrite-only drafting, and entailment verification. A failure at any gate produces a refusal, not a fabricated sentence. Refusals are a feature.
If your current proposal software doesn’t expose those gates to the user, that is probably why you don’t fully trust its output.