Linking debrief notes to the specific answer blocks that failed

A debrief comment is prose. A KB block edit is structured. The work in between is a small but stubborn matching problem: given a sentence like “the encryption section read like it was written for a previous compliance posture,” find the exact KB block that sourced the encryption section in that proposal.

This post is the linker. Two passes, one heuristic, one model. Where each is useful, where each fails, and the SQL the dashboard runs on every comment.

The provenance trail we already have

Every drafted answer in PursuitAgent carries provenance back to the KB blocks it was composed from. We covered the structure in the answer provenance graph post. For a given proposal section, we can list the blocks that sourced it, the spans inside each block that were used, and the version hash at the time of drafting.

That trail is the backbone of the linker. If a debrief comment is on Section 4.2, the candidate blocks are the blocks that sourced 4.2. The linker’s job is to pick which of those (often three to seven candidates) the comment is actually about.

Pass 1 — heuristic narrowing

The first pass is rules. Cheap, fast, and useful for the obvious cases.

-- Given a debrief note on a proposal, find candidate blocks via section provenance.
with note_section as (
  select section_id from debrief_note_section_link
  where debrief_note_id = $1
),
candidate_blocks as (
  select b.id, b.title, b.body, sp.span_start, sp.span_end
  from kb_block b
  join section_provenance sp on sp.block_id = b.id
  join note_section ns on ns.section_id = sp.section_id
)
select * from candidate_blocks;

If the comment is on Section 4.2 and Section 4.2 was sourced from blocks B1, B3, and B7, those are the candidates. The heuristic narrowing reduces the search space from ~3,000 KB blocks to ~5.

The heuristic also strips candidates that don’t share any meaningful tokens with the comment (after stop-word removal). A comment about “encryption” that returns a candidate block titled “incident response runbook” is not the link, regardless of section provenance.

Pass 2 — entailment-based ranking

The remaining candidates go through an entailment pass. The prompt is small:

const linkerPrompt = `
A reviewer wrote this comment about a proposal section:

> ${note.text}

The section was sourced from these candidate KB blocks. For each, decide:
- Is this block plausibly the one the comment refers to? (yes/no)
- If yes, what specifically in the block does the comment seem to flag? (one sentence)
- What edit would address the comment? (one sentence)

Candidates:
${candidates.map((b, i) => \`[\${i + 1}] \${b.title}\\n\${b.excerpt}\`).join("\\n\\n")}
`;

The model returns a ranked list. The dashboard surfaces the top candidate (with confidence) and the suggested edit. The reviewer accepts, edits, declines, or defers.

We use Claude Sonnet 4.6 for this pass. We tested Haiku and Gemini Flash — they were faster but produced more false positives, particularly when the comment was abstract (“the tone in this section felt off”). Sonnet was the right tradeoff at the cost we’re paying.

Where the linker works

For comments that name a concrete failure — wrong number, wrong audit period, missing evidence — the linker’s top suggestion is right ~80% of the time in our internal eval. The reviewer accepts directly, the edit goes into the KB block as a proposed revision, and the next bid drafting against that block sees the new content.

The accuracy is high because concrete-failure comments share vocabulary with the block. “We claimed AES-128 but the real posture is AES-256” matches the block that contains “AES-128” with very high score.

Where it fails

Three failure modes, in order of frequency.

Abstract comments. “This section felt thin” or “the win theme didn’t land here” are valid debrief notes that don’t link to a single block. The linker’s job here is to surface that fact — the dashboard marks the comment as “no specific block target” and the reviewer can either tag it manually or leave it as a section-level note.

Comments that span blocks. “The encryption section and the access control section disagreed” is a legitimate concern that points at two blocks, not one. Today the linker handles this by surfacing both candidates with lower confidence; the reviewer chooses to file two suggested edits or one cross-cutting note.

Block-was-renamed-or-moved. When a KB block has been substantially edited or split since the proposal was drafted, the version hash in the provenance trail points at content that no longer exists. The linker walks the version chain to find the descendant blocks, but if the block was deleted outright, we surface the comment with a “block no longer exists” flag and ask the reviewer to confirm whether the issue is resolved.

The accept/decline path

When a reviewer accepts a linker suggestion, the dashboard creates a proposed revision on the target KB block. The revision is not auto-applied. It enters the block-versioning workflow — owner notification, review window, then merge.

If the reviewer declines a suggestion, the linker logs the decline as training data. After 50 declines on similar comment-block pairs, we re-tune the entailment threshold for that company’s corpus. This is one of the few places we adjust system behavior per-tenant; the alternative is shipping a default that’s wrong for half our customers.

The numbers

Across the early-access cohort (12 customers, ~3,400 debrief notes processed in January):

68% of notes auto-linked to a single block with confidence above the acceptance threshold.
22% surfaced multiple candidates for the reviewer to choose.
10% had no candidate (the no-target case).
Of the 68% auto-linked, 81% were accepted by the reviewer without modification.
The accepted suggestions produced ~1,800 KB block revisions across the cohort in January.

The unblocked work — the ~22% that surfaced multiple candidates — is the work the dashboard is doing for users who would otherwise have stared at a 3,000-block KB and tried to find the right one by memory.

What we’re not claiming

The linker doesn’t tell you whether a comment is correct. It tells you what the comment is about. The judgment of whether the proposed edit is the right edit is the reviewer’s, and that’s deliberate. Fully automating the edit would mean the KB drifts based on whichever debrief comment was loudest, rather than on what the team actually believes.

This is the closing piece of the schema-and-clustering pair. Tomorrow Sarah writes about the debrief ritual that produces these comments in the first place. Without the ritual, the linker has nothing to link.