Blog · Tag

retrieval.

28 posts in this archive.

Engineering

Query understanding, a year on: where the model won

A year of hand-written query-rewrite rules versus LLM-based query rewriting on RFP questions. Which side won, where the hand-written rules still beat the model, and what the hybrid looks like now.

The PursuitAgent engineering team
Engineering

The SLA on draft generation: 45 seconds, 95th percentile

The operational target we hold draft generation to, why it's 45 seconds and not 30 or 90, and the specific things we do to hold the number under peak federal-FY-Q2 load.

The PursuitAgent engineering team
Engineering Long read

One year of grounded retrieval: what changed, what didn't

The engineering companion to the founder retrospective. A year of build-log posts, condensed: what the retrieval stack looks like now, how verification evolved, what the gold set became, and what's still unsolved.

The PursuitAgent engineering team
Engineering

Embedding evaluation, revisited

What we measure differently from 12 months ago. How the gold set grew, which metrics earned their spot in CI, and which ones we quietly retired.

The PursuitAgent engineering team
Grounded AI

Grounded AI for win-theme discovery

How we surface candidate win themes from a corpus of 80 winning proposals without inventing them. The retrieval pattern, the entailment guard, and where the system refuses rather than guesses.

The PursuitAgent engineering team
Engineering

Migrating to Gemini Embedding v3, the safe way

A dual-index backfill and a staged cutover across two weeks. How we evaluated retrieval deltas before the switch, what we watched for during the cutover, and the one metric that gated the final flip.

The PursuitAgent engineering team
Engineering

Retrieval over Slack history: what works, what's too sharp

An experiment with RAG over customer-Slack channel history. Three useful retrieval patterns, two failure modes that led us to gate the feature behind explicit capture flags, and the operational guardrails.

The PursuitAgent engineering team
Grounded AI

When two citations disagree: how the draft resolves it

Two KB chunks say different things about the same claim. The conflict-resolution logic that decides which one the drafted answer cites — when to prefer newer, when to prefer higher-authority, and when to refuse.

The PursuitAgent engineering team
Engineering

Retrieval eval snapshot, December 2025

Quarter four retrieval evaluation numbers against our held-out RFP and DDQ corpus. What moved since September, what's still stuck, and which regressions we're not yet fixing.

The PursuitAgent engineering team
Engineering

Tuning pgvector HNSW for proposal workloads

M, ef_construction, ef_search — the three knobs that decide retrieval latency and recall in a pgvector HNSW index. What we chose for PursuitAgent and why.

The PursuitAgent engineering team
Procurement Long read

Security questionnaires: the 80% that's really retrieval

The canonical Engineering pillar on DDQ automation. A 300-question security questionnaire is not 300 unique questions — it's mostly retrieval against a corpus that's already written, plus a small tail that isn't.

The PursuitAgent engineering team
Engineering

Our retrieval eval, quarterly report

A quarter of running our retrieval evaluation harness against a frozen gold set: the regressions we caught, the two changes that actually moved precision, and the metric we stopped reporting because it lied.

The PursuitAgent engineering team
Engineering

Retrieval evaluation, part 2: dealing with numeric claims

Why numeric facts break vanilla retrieval and the two tactics — hybrid search and numeric-claim isolation — that fix it. Continuation of the eval series.

The PursuitAgent engineering team
Engineering

How we curate the retrieval gold set

120 questions, three annotators, a disagreement-resolution protocol. The recipe behind the held-out set we evaluate every retrieval pipeline change against — and the parts we plan to open-source.

The PursuitAgent engineering team
Grounded AI

The reranker that paid for itself

Rerankers add latency and cost. They earn it back when retrieval is borderline and the wrong block in the top-K poisons the draft. Where we run a reranker, where we do not, and the honest tradeoffs.

The PursuitAgent engineering team
Engineering

Query rewriting for RFP questions with implicit context

Most RFP questions retrieve poorly because they assume context the corpus does not carry. Query rewriting turns 'describe your approach' into a retrieval string that hits. Examples, the rewrite chain, and the cost tradeoff.

The PursuitAgent engineering team
Engineering

The chunk size ablation: 256, 512, 1024 tokens on RFP text

We ran the same retrieval pipeline at three chunk sizes against our RFP-text gold set. Directional results, the tradeoffs that surfaced, and why we don't ship a single global chunk size.

The PursuitAgent engineering team
Engineering

Our eval harness, on the command line

A walkthrough of the dev loop for retrieval changes — one command to baseline, one command to re-run, one to diff. The CLI ergonomics that keep us from tuning by feel.

The PursuitAgent engineering team
Engineering Long read

How we evaluate retrieval quality on our own corpus

Our gold set, the metrics we track, the eval harness on a laptop, the regression-guard CI job, and the directional numbers we'll publicly stand behind. Long read.

The PursuitAgent engineering team
Engineering

Our retrieval latency budget, explained

Where the milliseconds go in a single retrieval call: embedding lookup, vector search, reranker, hybrid merge, payload hydration. P50 120ms, P95 400ms, and what we cut to get there.

The PursuitAgent engineering team
Engineering

Hybrid search: dense embeddings plus BM25 for proposals

Pure dense retrieval misses on numeric identifiers, product names, and SOC codes. Pure BM25 misses on paraphrase. The blend ratio we use, how we tune it, and the test set that catches regressions.

The PursuitAgent engineering team
Grounded AI Grounded Retrieval 101 · Part 4/4

Grounded Retrieval 101, Part 4: what we're still wrong about

The closing post of the Grounded Retrieval 101 series. Three failure modes we have not solved — numeric precision, compound claims, synonym drift — with the test cases that surface them and what we are doing about each.

The PursuitAgent engineering team
Engineering

Testing retrieval: gold sets, precision@k, and why BLEU lies for proposals

Surface-form metrics like BLEU and ROUGE rate proposal text by token overlap. Token overlap is a poor proxy for whether the answer is actually right. Here's the eval stack we use instead.

The PursuitAgent engineering team
Grounded AI Long read

Grounded retrieval: what it is, what it isn't, what we measure

The canonical long-read on grounded retrieval: the three invariants, the anti-patterns, the eval harness, the four open failure modes, and the research we're running next.

The PursuitAgent engineering team
Engineering

Our chunking pipeline, end to end

Five stages between an uploaded PDF and a retrievable KB block: parse, structural split, semantic rechunk, overlap, and index. Where each one fails and why we kept the boundaries.

The PursuitAgent engineering team
Engineering

Embedding model selection: why Gemini Embedding 2 for proposals

A teardown of how we evaluated four embedding models — Gemini Embedding 2, OpenAI text-embedding-3-large, Cohere embed-v4, and Voyage — for a proposal corpus, and the methodology that drove the choice.

The PursuitAgent research team
Grounded AI Grounded Retrieval 101 · Part 1/4

Grounded Retrieval 101, Part 1: what RAG is and why it still hallucinates

RAG in three sentences, then the hard part: why retrieval-augmented generation still produces fabricated answers, and what the academic and practitioner literature says about it. Part 1 of a four-part series.

The PursuitAgent engineering team
Engineering

How we chunk proposals for retrieval

Fixed-window chunking loses at headers, table cells, and numeric clauses. This post walks through the structural-plus-semantic chunking strategy we run on past proposals and KB content blocks, with code.

The PursuitAgent engineering team

See the proposal workflow

Take the 5-minute tour, then start a trial workspace when you're ready to run a real pursuit against your own source material.