Field notes

In preview: question router v2 with confidence scores

DDQ questions now route with a confidence score in preview. High-confidence routes auto-draft from the KB; low-confidence routes to human review with a typed reason for the routing call.

PursuitAgent 3 min read Engineering

Question router v2 is in preview behind the DDQ feature flag. The change: every DDQ question now carries a confidence score from the routing step, and the score determines whether the question auto-drafts from the KB or routes to a human reviewer. DDQ is a pursuit type the marketed platform pages do not yet describe; this work is the routing layer we will surface when DDQ joins RFP Analysis on the platform site.

What the score means

The confidence score is a number between 0 and 1, computed as a weighted combination of three signals:

  • Retrieval certainty — the score gap between the top retrieval candidate and the second candidate. A wide gap means the system is confident it found the right block; a narrow gap means there are several plausible matches and the routing should not assume the top is correct.
  • Block freshness — how recently the candidate block was reviewed and approved. Stale blocks reduce confidence regardless of retrieval certainty.
  • Verifier pre-check — a lightweight entailment check that runs before drafting. If the candidate block doesn’t appear to entail the question’s expected answer shape, confidence drops.

The combined score determines routing.

Routing thresholds

  • ≥ 0.85 — auto-draft. The question routes to the drafting layer, which produces an answer grounded in the candidate block, with the standard citation footer. The answer enters the proposal in auto-drafted state, awaiting reviewer confirmation but not blocking other work.
  • 0.65 to 0.85 — assisted. The question routes to a writer with the top candidate pre-attached. The writer drafts the answer with the candidate as the source; the system verifies entailment before the answer enters review state.
  • < 0.65 — human review with typed reason. The question routes to a security-or-domain reviewer. The routing pane shows the typed reason: retrieval split between two candidates, candidate block stale, no candidate block above retrieval floor, entailment pre-check failed on top candidate. The reviewer either selects from the candidate list, points the question at a different block, or flags the question for new content creation.

The thresholds are tunable per project; the defaults above match what we observed in internal eval. Strict-mode pursuits (federal acquisition, regulated security questionnaires) typically tighten the auto-draft threshold to 0.92.

What this changes operationally

Two things, both visible in the workflow.

The first: the proposal pane now shows a count of questions in each routing state. Reviewers can prioritize the human review queue without reading every auto-drafted answer first. The auto-draft answers still need confirmation before submission, but they don’t block parallel review work.

The second: the typed reasons surface routing failures to the reviewer in a useful form. When a question goes to human review with reason candidate block stale, the reviewer knows the fix is “ask the SME to refresh the source block,” not “rewrite the answer from scratch.” The router has done the diagnostic work; the reviewer is acting on a tagged finding rather than a blank pane.

What we don’t claim

The confidence score is not a guarantee of correctness. A 0.92-confident answer can still be wrong if the candidate block itself is wrong about the underlying fact. The score reflects the system’s certainty about its routing, not the underlying truth of the source. Citation discipline still applies: every auto-drafted answer carries the same provenance graph and the same reviewer-clickable citation footer as a hand-drafted one.

What’s next

The next router work is on multi-block compositions — questions whose answer should draw from two source blocks (capability-plus-attestation patterns are common in security questionnaires). The current router treats each question as single-block; multi-block routing is in development behind a feature flag.

Docs: see DDQ Workflow → Routing for the full state diagram and the reviewer-pane walkthrough.

Sources

  1. 1. PursuitAgent — DDQ classification, shipped
  2. 2. PursuitAgent — DDQ response playbook
  3. 3. PursuitAgent — Confidence scores for grounded drafts (preview)