Grounded AI for win-theme discovery

We shipped a feature this month that suggests win themes for a new bid based on themes that have worked across past winning proposals. The risk in that sentence is the word suggest. A naive version of this feature would invent themes that read plausibly and don’t trace to anything real. This post is how we built the version that doesn’t.

The principle is the same one that runs everywhere in our product: the model rewrites from a corpus it can cite, and refuses when it can’t.

The setup

A team has 200 closed bids in PursuitAgent. 80 of them won. Each bid has 2-4 asserted win themes (capture field 6 from Part 1 of the win-loss series). Roughly 240 winning theme assertions across the corpus.

When the team starts a new bid, the win-theme suggestion feature shows up after the capture lead has named the buyer, the strategic context, and the evaluation criteria. The feature’s job is to surface 3-5 candidate themes drawn from the company’s own winning history, with citations to the bids those themes came from.

This is not a generation problem. It’s a retrieval problem with a thin synthesis layer on top.

The retrieval

We embed the new bid’s context (buyer, strategic context, evaluation language) and the existing winning theme assertions in the same space. Top-K retrieval surfaces the themes whose embedded context is most similar to the new bid’s context.

const candidateThemes = await db.query`
  select t.id, t.label, t.assertion, p.id as proposal_id, p.buyer_name, p.submitted_at
  from proposal_theme t
  join proposal p on p.id = t.proposal_id
  where p.outcome = 'won'
    and p.company_id = ${companyId}
    and (1 - (t.context_embedding <=> ${newBidEmbedding})) > 0.72
  order by t.context_embedding <=> ${newBidEmbedding}
  limit 20;
`;

The cosine threshold of 0.72 was tuned on 12 customers’ corpora. Above that, the candidates are reliably “the same kind of bid.” Below that, the candidates start to drift into “vaguely related” and the suggestions get mushy.

The clustering pass

Twenty candidate themes is too many to show. The clustering pass — the same one we covered in Thursday’s clustering post — collapses near-duplicates and surfaces the underlying recurring themes.

For a healthy corpus, twenty candidates collapse to four to seven distinct themes. Each cluster has 2-6 member assertions across past winning bids.

The synthesis — and the entailment guard

This is where most win-theme features break. The naive approach: take the cluster’s members, ask the LLM to write a single polished win theme that captures their essence. The output reads great and isn’t bound to any specific assertion.

Stanford HAI’s research on legal RAG hallucinations is the standing reminder of why this fails: 17-33% of “grounded” outputs from commercial systems aren’t supported by the retrieved evidence. A polished win theme that nobody on the team actually asserted is a fabricated past performance.

What we do instead:

const synthesisPrompt = `
The following winning win-theme assertions cluster together. Choose the single
strongest assertion from the list — verbatim, no paraphrase — that best fits
the new bid context.

Then, separately, list the 1-2 specific evidence anchors (numeric, named) that
appear in the chosen assertion and are concrete, not generic.

If no assertion in the list is concrete enough to anchor a win theme, return
"no suitable assertion."

New bid context:
${newBidContext}

Cluster members:
${clusterMembers.map((m, i) => \`[\${i + 1}] \${m.assertion}\`).join("\\n")}
`;

The model picks an existing assertion. It does not generate a new one. The output for the user is “use this verbatim assertion from the [date] bid to [buyer]; here are the evidence anchors that made it concrete.” The user can edit it for the new buyer’s context, but the starting point is something the team has actually said, in a bid the team has actually won.

The entailment check

Before showing the suggestion, we run a final check: do the evidence anchors named in the chosen assertion still hold? “We’ve delivered this consolidation for nine other public companies” was true when the original bid shipped. Is it still true now?

The check looks at the KB blocks that sourced the past bid’s evidence and pulls their current state. If the number has changed (now we’ve delivered for fifteen, not nine), the suggestion is updated. If the evidence has been retired, the assertion is filtered from the suggestions and the user sees fewer candidates rather than a stale one.

This check is the part that makes the suggestion safe to ship to a regulated buyer. Without it, the feature would route a year-old number into a new bid as if it were current. That’s not retrieval — that’s a delayed lie.

Where it works

For teams with at least 30 winning bids and KB blocks that the team maintains, the suggestions land. In our early-access cohort, capture leads accepted ~58% of suggestions verbatim, edited ~22% (mostly to retarget the assertion at the new buyer’s vocabulary), and declined ~20%. The decline path is logged so we can tune the retrieval threshold per company.

The accepted suggestions arrive in the bid as themes with provenance to the source bids. The proposal carries the citation: “this theme echoes our successful response to [buyer] in [date].” That citation is real and useful in internal review, and it can be omitted from the buyer-facing draft.

Where it refuses

Three refusal cases.

No close cluster. The new bid’s context doesn’t match anything in the winning history. Maybe it’s a new vertical for the team, or a category of work they haven’t won before. The feature returns “no high-confidence themes from your history match this bid context” and points the user at Sarah’s win-themes field guide for the from-scratch path.

Cluster too thin. The cluster has only one or two members. We don’t synthesize from a sample of two; the variance is too high. The user sees the individual assertions with their bid context as a “consider these” list, not a unified suggestion.

Stale evidence. The most concrete assertions in the cluster reference numbers or named items that the KB blocks no longer support. The feature filters them rather than rewriting them with current numbers, because the assertion as written carried a specific framing the team chose at that moment, and we don’t autorewrite past framings.

What this is not

It is not a generator that writes a win theme for you. It is a retriever that surfaces themes you’ve actually won with, ranked by relevance to the current bid, filtered for evidence freshness. The synthesis step picks an existing winner; it does not invent a new one.

If you want a generator, the open generic LLMs will write you a polished win theme in three seconds. It won’t be grounded in anything your team has actually delivered. The Stanford HAI numbers — 17-33% hallucination on grounded systems — are the floor of what an ungrounded system fabricates.

This feature is a small thing the dashboard does on the way to surfacing the same content the embedding clustering post describes. It exists because capture leads asked for the suggestions to flow into the bid kickoff, not just the post-bid analysis.