Year-one numbers we can share

The default startup posture at year-end is a metrics post with impressive-looking numbers, an arrow pointing up and to the right, and no context. I am going to try something different. What follows is the list of product metrics we tracked internally in 2025, with a column for whether we are willing to post the number publicly and a column for why or why not.

The metrics

Number of paying customers. Not publishing. We are small enough that the number would signal more about our sales pipeline than about the product; it would also strip-mine a handful of early customers who did not sign up to be a public data point. Wave 2 of the category report in February will quote it in aggregate.

Annual recurring revenue. Not publishing. Same reasoning as above, plus revenue numbers from a seven-figure company tell you nothing useful about whether the product works.

Proposal responses processed through the system. Publishing in aggregate — tens of thousands — without per-customer breakdown. This number is useful because it establishes that the product has a production load, not that it is a demo.

Citation verification rate (internal eval). Publishing: 94.3% of drafted spans pass the span-level verification prompt on the current evaluation suite. Context is required: the evaluation suite is 400 questions we designed. This is an internal number, not an independent benchmark. It has moved from 81% in January to 94.3% in November. We are publishing the number because we are also publishing the methodology and the caveats. A number without the caveats would be worse than no number.

Human-edit rate on drafted sections. Publishing: roughly 38% of drafted sections are edited substantively before ship. This number is often quoted by competitors as “AI does 60-80% of the work.” We think that framing is misleading; a draft that needs a substantive edit in almost 4 out of 10 sections is a draft that is useful, not one that is finished. We publish the number to anchor expectations, not to claim a win.

Time from RFP upload to compliance matrix ready. Publishing: median 42 minutes on a 50-100 page RFP. The spread is wide — the 90th percentile is 2.5 hours because OCR on a scanned document stretches intake materially. The median is the honest number; the spread is the honest context.

Cost per response (our side, LLM spend). Publishing in a range: between $4 and $17 per full RFP response, depending on size, number of draft rounds, and verification cycles. Below $4 we are not yet verifying enough spans; above $17 the customer is probably re-drafting more than the initial draft merited. The range is interesting; the single number is not.

Ungrounded-span flag rate. Publishing: 3.2% of drafted spans get flagged by the verification prompt as ungrounded. Context: the flag is the leading indicator of a claim the retrieval did not support, and a flag sends the draft to human review. 3.2% is too high for autonomous shipping, which is exactly why we do not autonomously ship.

KB blocks promoted from post-mortem write-back. Publishing: average 7.2 blocks per completed bid, across the customer base. This number matters because it is the closest proxy we have for the compounding thesis — a bid that does not write back to the KB does not compound; a bid that writes back several blocks is a bid whose learning survives into the next one.

What we are not publishing and why

Customer names. We have a handful of public customers; most of our customer base is private. We do not out customers publicly without their permission, and we will not put a logo wall up that signals more customers than we have or implies endorsements we did not earn.

Churn rate. Too small a customer base to produce a meaningful number. A percentage-based churn figure on a small base is either 0% (which flatters) or a single-digit-percent figure that conveys more variance than signal. Wave 2 of the State of Proposal Tools will include peer vendor churn where we can cite it; it will not include ours.

Win rates for customers using the product. This is the number everyone wants us to publish. We will not, for two reasons. First, we do not control the buyer’s decision — attributing a win to a tool overstates the tool’s role. Second, even among customers who would agree that PursuitAgent helped, we cannot rigorously separate the product’s contribution from a simultaneous improvement in their internal process, their hiring, their market. A win-rate claim without a controlled comparison is the kind of vendor number the category has earned its skepticism of.

NPS. We have an internal number. We are not publishing it because NPS on a very small base of friendly early customers is not a signal worth publishing. Ask again at Wave 2.

The category pattern this counters

Proposal-software vendors publish headline win-rate numbers — “customers win 30% more” — that do not survive scrutiny. Nobody can show the controlled comparison. Nobody can show that win-rate change was due to the product rather than a coincident factor. The category has trained buyers to mistrust vendor metrics, and every category participant pays for that mistrust.

The post above is our attempt to move the posture. Publish what can be defended with methodology; withhold what cannot; say which is which. Over time, the posts make a cumulative argument. A single Y1 metrics post does not close the trust gap. A five-year sequence of them, where the numbers move visibly and the methodology stays consistent, does.

What the next one will add

Wave 2 in February, and a fuller year-two post next December, will add: churn (probably), customer count in aggregate, the eval-suite numbers against an external benchmark if a credible one emerges, and the compounding-write-back metric broken out per-customer tenure (so a Y2 customer versus a Y1 customer becomes visible as a multiplier, if the thesis is right).

The numbers we do not publish this year are the ones where the context is not yet meaningful. Next year, some of them will be.

— Bo