Shipped: content-block freshness scores
Every KB block now carries a freshness score that decays as the source ages, drifts from the company's current marketing language, or contradicts a more recent block. Stale citations get caught at draft time.
Content-block freshness scores shipped to all customer KBs this week. Every block in the knowledge base now carries a freshness value between 0 and 1, computed nightly. Drafting surfaces the freshness alongside the citation; gold-team review can filter on it. This is the engine under the expiration alerts the Knowledge Base page describes — a richer 0-to-1 signal where the marketed surface today is binary “expired / current.”
Freshness is not a maintenance feature. Outdated KB content actively undermines reviewer trust and, in proposal contexts, can cause the team to ship boilerplate that contradicts current product capability. We have been treating freshness as a first-class signal alongside retrieval relevance for the last two months in beta; it is now in the production path for everyone.
What we score
Three signals feed the freshness number for each block.
Source age. When was the underlying document last uploaded or re-imported? A block from a document that was last touched 18 months ago decays faster than a block from a document touched two weeks ago. The decay function is linear over the first 12 months and steeper thereafter; we tuned the curve against blocks customers manually marked as stale during beta.
Marketing-language drift. Every customer’s KB is compared, weekly, against the company’s current marketing site (the customer points us at the URL on setup, and we re-crawl). Blocks whose substantive nouns and named entities have drifted from the marketing language — a block that names a product feature using a term the company no longer uses — score lower on this dimension. The drift score is independent of source age; a recently uploaded block can still drift if the company’s external language has moved faster.
Contradiction with newer blocks. When a newer block in the same KB makes a claim that semantically contradicts an older block, the older block’s freshness drops. The contradiction detector is a small entailment model run on block pairs; it surfaces “block A says X, block B says not-X” candidates for human review and, when confirmed, deprecates the older block.
The three signals combine into a single 0-to-1 score. The composition weights are: 0.5 source age, 0.3 marketing-language drift, 0.2 contradiction. We expose all three component scores in the UI for customers who want to inspect.
Where it surfaces
Two places.
At draft time. When the drafting engine selects a block as a citation candidate, the block’s freshness is shown next to the relevance score. A block with high relevance (0.91) and low freshness (0.32) is flagged. The drafting engine still uses the block — relevance dominates — but the citation in the draft renders with an amber freshness indicator. The reviewer at gold team sees the indicator and can choose to escalate.
At review time. The KB management view exposes a freshness column. Customers can sort, filter (“show me all blocks with freshness below 0.5”), and bulk-action on stale blocks. The most common bulk action is “mark for SME re-review,” which routes the block to its registered owner with a link.
What this catches
The most common production case: a customer’s product-overview document was uploaded 14 months ago. The block describing the product’s capabilities is still relevant to incoming RFP questions, but the marketing language has shifted. The product is now described on the marketing site as “policy-aware analytics,” but the block still uses the previous-generation phrase. The drafting engine cites the block; the citation renders with an amber freshness indicator; the gold-team reviewer sees the flag and updates the block before submission.
Without freshness scoring, this case would either ship as-is (with the proposal using terminology the buyer cannot match against the company’s current website) or get caught only by a vigilant reviewer who happened to know the marketing language had changed.
What it doesn’t catch
Freshness is not correctness. A block can be fresh (recent source, current marketing language, no contradicting blocks) and still wrong. We do not claim otherwise. Freshness reduces the rate of stale-citation incidents; it does not eliminate citation errors that have other causes.
The contradiction detector is also conservative. It surfaces high-confidence pairs only and asks for human confirmation before deprecating a block. This means we miss soft contradictions — a block that’s partially superseded by a newer block. We are tracking this as the primary failure mode and have an open work stream on graduated contradiction (the older block’s freshness drops by an amount proportional to the overlap, rather than only deprecating on full contradiction).
Where to find it
Freshness scoring is on by default for all KBs. No configuration needed. The score takes 24 hours to populate after the first nightly run; until then, blocks default to 1.0 (treated as fresh).
The KB platform page documents the scoring model in more detail: /platform/knowledge-base. Customers who want to disable any of the three signals can do so per-KB from the settings panel; we don’t expect many to disable, but the marketing-drift signal in particular is sometimes turned off by customers whose marketing site is in flux.
Stack notes
- Source age: computed against
documents.imported_at. No external dependency. - Marketing-language drift: weekly crawl of the customer’s marketing URL, embedded against the KB blocks, drift measured by cosine distance between block-vocabulary centroid and marketing-site centroid.
- Contradiction detection: same entailment verifier we use for citation verification, run on block-pair candidates surfaced by a fast similarity prefilter.
The verification stack from the grounded retrieval pillar piece is the same stack that powers contradiction detection here. Re-using the verifier was a deliberate choice — we did not want a separate model for this and we did not need one. Same model, same cost profile, same failure modes the eval harness already tracks.