Blog · Pillar

Grounded AI.

41 posts in this archive.

Grounded AI

Citations are the product, a year in

The line from our grounded-AI pledge at month one to how the product is now judged by reviewers. Citations were a feature. Now they're what customers are buying.

The PursuitAgent engineering team
Engineering

Draft latency, a year on: 45s P95 to 28s

A year of draft-latency work. What moved P95 from 45 seconds to 28, which changes cost quality and which cost money, and the three tradeoffs we chose not to take.

The PursuitAgent engineering team
Engineering

Query understanding, a year on: where the model won

A year of hand-written query-rewrite rules versus LLM-based query rewriting on RFP questions. Which side won, where the hand-written rules still beat the model, and what the hybrid looks like now.

The PursuitAgent engineering team
Grounded AI

Hallucination rate: a year-in measurement update

How we measure hallucination rate on grounded drafts, what the number looks like a year in, what moved it since the early baseline, and where the number lives in production for customers to see.

The PursuitAgent engineering team
Engineering

Shipped: grounded-summary export with inline sources

The export path customers have asked for since month one. Executive summary exports now carry the inline citations as hyperlinks in the DOCX and PDF outputs, with an appendix that lists every evidence source in order.

PursuitAgent
Grounded AI

New models, quarterly eval: Sonnet 4.6, GPT-5.2, Gemini 3.1 Pro

An internal eval across three current-generation models for our specific workloads — drafting, claim verification, extraction. What moved, where we switched defaults, and why one workload still sits on a year-old model.

The PursuitAgent engineering team
Grounded AI

The claim-verification cost profile, stage by stage

Per-claim verification is the defense against citation hallucination. It also costs real money. A breakdown of token costs at each stage of the verification pipeline, with the numbers we actually see in production.

The PursuitAgent engineering team
Category

Competing on groundedness, not features

Opinion. The AI proposal category is running a feature race. The only durable edge for an AI-native tool is whether its outputs are traceable — and that is not a feature you ship. It is a posture you hold.

Bo Bergstrom
Engineering Long read

One year of grounded retrieval: what changed, what didn't

The engineering companion to the founder retrospective. A year of build-log posts, condensed: what the retrieval stack looks like now, how verification evolved, what the gold set became, and what's still unsolved.

The PursuitAgent engineering team
Grounded AI

Grounded AI for win-theme discovery

How we surface candidate win themes from a corpus of 80 winning proposals without inventing them. The retrieval pattern, the entailment guard, and where the system refuses rather than guesses.

The PursuitAgent engineering team
Category Long read

A full-year retrospective on shipping grounded AI

Twelve months of evidence on the grounded-AI thesis. The Stanford hallucination number measured against our corpus, four failure modes and which ones we closed, what changed under the hood, and what I would tell Q1 Bo.

Bo Bergstrom
Grounded AI

When two citations disagree: how the draft resolves it

Two KB chunks say different things about the same claim. The conflict-resolution logic that decides which one the drafted answer cites — when to prefer newer, when to prefer higher-authority, and when to refuse.

The PursuitAgent engineering team
Grounded AI

Grounded-AI regressions we caught in year one

Four regressions in our grounded-drafting pipeline this year. How we caught each one, how long it took to roll back, and the one we did not catch in time. Engineering notes, not a victory lap.

The PursuitAgent engineering team
Grounded AI

A Christmas Eve draft review note

Two drafts crossed my desk this morning, Christmas Eve. What they said, what they missed, and why the review still happened despite the date.

PursuitAgent
Engineering

The prompt library behind grounded drafting

Seven named prompts, one kill-switch registry, a versioning scheme, and the governance pattern we use to keep prompt sprawl from becoming an outage. Engineering notes on how we actually run prompts in production.

The PursuitAgent engineering team
Category Long read

What 'compounding' means for proposal software

PursuitAgent's tagline is 'every RFP you win makes the next one easier.' What that actually requires: four mechanisms of compounding, why most AI tools aren't compounding tools, and questions to ask a vendor.

Bo Bergstrom
Grounded AI

Detecting ungrounded spans in drafts, line by line

A per-sentence classifier that flags which spans in a drafted RFP answer lack source coverage in the retrieved context. What it costs, what it catches, and what it still misses.

The PursuitAgent engineering team
Grounded AI

Confidence-threshold tuning for DDQ auto-answer

Where we set the confidence bar for auto-answering a DDQ question. The precision/recall trade-off, explained with our own data and the number we actually use for security questionnaires.

The PursuitAgent engineering team
Procurement Long read

Security questionnaires: the 80% that's really retrieval

The canonical Engineering pillar on DDQ automation. A 300-question security questionnaire is not 300 unique questions — it's mostly retrieval against a corpus that's already written, plus a small tail that isn't.

The PursuitAgent engineering team
Grounded AI

Citation UI: three designs we tried, two we kept

How we render inline citations next to grounded-AI output. Three UX experiments — footnote chips, side-pane evidence cards, and inline hover popovers — and what we learned about which ones reviewers actually use.

The PursuitAgent engineering team
Category

Grounded AI is not a feature, it's a refusal

Opinion. The thing that makes grounded AI different from regular AI is what the system refuses to do — answer when retrieval is empty. Here's what we will not ship even when reviewers ask for it.

Bo Bergstrom
Grounded AI

Hallucination monitoring in production

The metric we watch weekly: per-claim refusal rate, citation-mismatch rate, and the human-graded sample. What we do when each one moves, and the threshold values that trigger an alert.

The PursuitAgent engineering team
Procurement

Security questionnaires: linking answers to evidence

How a SOC 2 attestation PDF becomes a citation source for DDQ answers. The ingest pipeline, the per-control extraction, and the per-claim linking that makes 'yes' answers verifiable instead of theatrical.

The PursuitAgent engineering team
Engineering

The citation density target per section

Why executive summaries get two citations per paragraph and technical sections get five. The rationale for citation density as a section-level target, and what happens to drafts that fall below it.

The PursuitAgent engineering team
Grounded AI

Numeric claim extraction and verification

How we parse numbers from drafts — percentages, dollar figures, head counts, dates — and check each one against a KB source before the sentence ships. The pipeline, the regex floor, the LLM ceiling, and what we still get wrong.

The PursuitAgent engineering team
Grounded AI

Confidence scores for grounded drafts, explained

What '82% confident' means in our drafting engine, how it's computed from retrieval and entailment signals, and where it leads the reviewer.

The PursuitAgent engineering team
Category Feature

The AutogenAI teardown: UK-origin RFP AI, two years in

What's public about AutogenAI: UK origin, generation-heavy stack, where they win in EU procurement, where the citation discipline is thin, and what we learned reading their materials.

The PursuitAgent research team
Grounded AI

Retrieval over diagrams, not just text

How we index D2 code and diagram descriptions so an architecture question can ground to a specific figure. The pipeline, the failure modes, and the citation surface for a diagram source.

The PursuitAgent engineering team
Category

Why we don't do autonomous proposal agents yet

An opinion piece. What an agentic drafting system would have to guarantee that retrieval doesn't, why we don't think the category is ready, and the work we'd want to see before changing our position.

Bo Bergstrom
Grounded AI

The reranker that paid for itself

Rerankers add latency and cost. They earn it back when retrieval is borderline and the wrong block in the top-K poisons the draft. Where we run a reranker, where we do not, and the honest tradeoffs.

The PursuitAgent engineering team
Grounded AI

The grounded drafting loop, step by step

Retrieve, draft under constraint, verify, emit — or refuse. The four-step loop that produces every drafted answer in PursuitAgent, and the failure mode each step exists to prevent.

The PursuitAgent engineering team
Engineering

Shipped: the inline verify button in drafts

Hover any drafted sentence in the proposal builder and a verify button surfaces the source block, the entailment trace, and the timestamp of the last KB update. Shipped this week.

PursuitAgent
Engineering Long read

How we evaluate retrieval quality on our own corpus

Our gold set, the metrics we track, the eval harness on a laptop, the regression-guard CI job, and the directional numbers we'll publicly stand behind. Long read.

The PursuitAgent engineering team
Grounded AI

The hallucination budget, per claim

Treat hallucination as a cost: each claim in a draft has a probability of being mis-attributed. Here's how we budget it, how we trade latency against grounding strength, and why the budget is per-claim, not per-draft.

The PursuitAgent engineering team
Grounded AI

The claim-level verification pass, explained

After the draft model writes a sentence, a smaller verifier model reads each substantive claim and asks: is this entailed by the source block? Here's how that works, what it costs, and where it still misses.

The PursuitAgent engineering team
Grounded AI Grounded Retrieval 101 · Part 4/4

Grounded Retrieval 101, Part 4: what we're still wrong about

The closing post of the Grounded Retrieval 101 series. Three failure modes we have not solved — numeric precision, compound claims, synonym drift — with the test cases that surface them and what we are doing about each.

The PursuitAgent engineering team
Grounded AI Grounded Retrieval 101 · Part 3/4

Grounded Retrieval 101, Part 3: the citation rendering stack

From a verified retrieval hit to an inline citation a reviewer can hover and trust. Four components: citation marker, hover card, source viewer, and audit log.

The PursuitAgent engineering team
Grounded AI Long read

Grounded retrieval: what it is, what it isn't, what we measure

The canonical long-read on grounded retrieval: the three invariants, the anti-patterns, the eval harness, the four open failure modes, and the research we're running next.

The PursuitAgent engineering team
Grounded AI Grounded Retrieval 101 · Part 2/4

Grounded Retrieval 101, Part 2: why citations don't guarantee groundedness

A citation tells you which passage was retrieved. It does not tell you whether the cited passage actually supports the generated claim. Part 2 of the Grounded Retrieval series — the entailment gap, and what closes it.

The PursuitAgent engineering team
Grounded AI Grounded Retrieval 101 · Part 1/4

Grounded Retrieval 101, Part 1: what RAG is and why it still hallucinates

RAG in three sentences, then the hard part: why retrieval-augmented generation still produces fabricated answers, and what the academic and practitioner literature says about it. Part 1 of a four-part series.

The PursuitAgent engineering team
Grounded AI

How the Grounded-AI Pledge is enforced in code

The Pledge says every drafted answer links to a source in your KB. Here's how the drafting engine enforces that — with refusals, not with model hygiene.

The PursuitAgent engineering team

See the proposal workflow

Take the 5-minute tour, then start a trial workspace when you're ready to run a real pursuit against your own source material.