KB block versioning: the five-year commit history

A KB block in PursuitAgent is the unit of citable content that grounds a draft. Each block has a current version, and behind that current version is a five-year (so far) commit history. This post is about what we keep, why we keep it, and how the versioning surfaces in the product.

Why every version

The naive design — store the latest version of each block, overwrite on edit — fails three workflows we care about.

Audit on past responses. When a buyer asks “you said X in your March 2024 response — is that still your position?” the team needs to retrieve the exact text that shipped, not the current text. Overwriting the block destroys the question’s answerability. We learned this the hard way on a customer who had migrated from a tool that did not version content; their March 2024 response cited a block that had been silently replaced by an updated version six months later, and the discrepancy surfaced during a renewal negotiation.

SME approval over time. A block approved by an SME in 2023 does not stay approved when the underlying claim changes in 2024. The approval needs to be tied to a specific version. If we overwrite, we lose the link between the approval and the text that was approved.

Rollback after a regression. When an SME edits a block and the next reviewer flags the edit as wrong, we want to revert. Not by manually retyping the previous text — by promoting an earlier version back to current.

The data-model cost of keeping every version is small (block edits are infrequent and the text is short). The product capabilities the versioning unlocks are not.

The schema

The relevant tables, in pseudo-Drizzle:

export const kbBlocks = pgTable("kb_blocks", {
  id: uuid("id").primaryKey(),
  companyId: uuid("company_id").notNull(),
  currentVersionId: uuid("current_version_id"), // points into kb_block_versions
  title: text("title"),
  ownerUserId: uuid("owner_user_id"),
  createdAt: timestamp("created_at").defaultNow(),
});

export const kbBlockVersions = pgTable("kb_block_versions", {
  id: uuid("id").primaryKey(),
  blockId: uuid("block_id").references(() => kbBlocks.id),
  versionNumber: integer("version_number").notNull(),
  body: text("body").notNull(),
  embedding: vector("embedding", { dimensions: 3072 }),
  authorUserId: uuid("author_user_id"),
  changeNote: text("change_note"),
  approvedAt: timestamp("approved_at"),
  approvedByUserId: uuid("approved_by_user_id"),
  retiredAt: timestamp("retired_at"),
  createdAt: timestamp("created_at").defaultNow(),
});

export const kbBlockUsages = pgTable("kb_block_usages", {
  id: uuid("id").primaryKey(),
  blockVersionId: uuid("block_version_id").references(() => kbBlockVersions.id),
  proposalId: uuid("proposal_id"),
  proposalSectionId: uuid("proposal_section_id"),
  citedAt: timestamp("cited_at").defaultNow(),
});

Three things to notice. First, every version carries its own embedding — retrieval can be done against the current version (default) or against any historical version (rare, but used in audit workflows). Second, every citation in a proposal points to a blockVersionId, not a blockId — so the proposal record is anchored to the exact text that shipped. Third, retiredAt is per-version, not per-block — a single block can have multiple retired versions and one current.

The 18-proposal commit history

Here is what an actual five-year-old block looks like in production. (Customer-anonymized; this is the shape, not the literal text.)

The block is a security-questionnaire answer about data residency. It was originally drafted in 2021 by a security engineer. Over five years it has 12 versions:

v1 (2021) — original draft. US-only data residency. Used in 4 proposals before the company opened a Frankfurt region.
v2 (2022) — added EU data residency. Edit by the same security engineer. SME-approved within the same day. Used in 7 proposals.
v3 (2022) — small wording fix. No semantic change. Used in 2 proposals.
v4 (2023) — added Singapore region. Approved 11 days after the edit (the SME approval cycle slowed; this is visible in the version metadata and was the trigger for a process change). Used in 3 proposals.
v5 (2023) — added a sentence about cross-region replication that turned out to be incorrect. SME approved it. A reviewer on the next response flagged it. We rolled back.
v6 (2023) — rollback to v4’s text, re-approved. Marked as rolled_back_from: v5.
v7 (2024) — added the cross-region replication sentence with corrected language and a citation to the security architecture doc. Used in 1 proposal.
v8–v11 (2024–2025) — incremental clarifications, mostly SME-driven, all approved within 48 hours.
v12 (2025) — current. Adds India and São Paulo regions. Approved by a different security engineer (the original SME left the company). Used in 1 proposal so far.

That history is queryable. The product surfaces it in the block-detail view as a timeline; engineering uses it for the audit and rollback workflows; the retrieval layer uses the currentVersionId for grounding.

The rollback flow

A rollback is not a delete. The block-detail UI shows the version list with a “promote” button next to each prior version. Promoting a prior version creates a new version (v6 in the history above) whose body matches the promoted version’s text and whose metadata records rolled_back_from. We never alter prior versions. The history is append-only.

This matters because past proposals that cited v5 still cite v5 — the rollback does not invalidate prior citations. The proposals that shipped with v5’s incorrect language are still recoverable in their original form, and the audit trail shows that the team became aware of the issue in 2023 and corrected forward.

Citations in flight

When a draft cites a block, it cites the current version at the time of citation. The proposal stores the blockVersionId. If the block updates after the citation is created, the proposal’s citation does not auto-update — the writer has to choose to refresh.

This is a deliberate UX decision. Auto-updating would be safer in some sense (the block is more current) and dangerous in another (the writer reviewed v8, and now the citation points to v9 which they have not reviewed). We default to anchoring the citation to the version the writer saw, with a “block updated since you cited it — review the diff?” prompt at the next draft pass.

What this enables

A few queries that fall out cheaply once the schema is in place:

“Which blocks have not been edited in 18 months?” The freshness-score query. (We have written about freshness scoring as a product feature.)
“Which past proposals cited the now-retired version of block X?” The audit query. Useful when a buyer-side question references a prior response and the team needs to know whether the answer has changed.
“Show me every change to block X by author.” The accountability query. Used in white-team retrospectives.
“Roll back block X to its state on 2024-08-15.” The forensic recovery query. Rare, but the existence of the capability changes how the team treats edits.

What we are still working on

Diff visualization across versions is rougher than we want. We render the diffs as line-level changes, which is fine for short blocks but not great for longer ones (a paragraph reorder shows up as a delete + insert rather than a structural move). A semantic diff — paragraph-level alignment with similarity scoring — is on the backlog.

Cross-block dependency tracking is not in place yet. Block A may cite block B in its body; if B retires, A’s citation to B becomes a dangling reference. Today we surface dangling references in a nightly check; we should be surfacing them at edit-time.

The five-year commit history is the kind of thing that does not feel valuable until the moment it is — and then it is the only thing that lets the team answer the question. That is most of what versioning is good for. Sparrow’s content-library essay describes the failure mode of unversioned libraries pretty plainly: stale content surfaces with confidence, and nobody can reconstruct when it stopped being correct. Block versioning is the cheapest insurance against that failure mode we know how to build.