Block schema v3: merging KB blocks and evidence atoms
The schema change that let DDQ evidence live in the same store as proposal answers. What we split, what we merged, and the migration that took a week longer than we planned.
Since launch we kept KB blocks and evidence atoms in separate stores. Block schema v3, shipped last week, merges them. This is the design note for why, how, and what we had to fix along the way.
The original split
KB blocks were our atomic unit for proposal answers. An answer to “Describe your incident response process” was one block: authored text, a freshness timestamp, an owner, a tag set, and version history.
Evidence atoms were our atomic unit for attestations, certifications, audit letters, and penetration-test summaries — the artifacts a DDQ respondent attaches alongside an answer. These lived in a separate evidence_atoms table with their own schema: an artifact URL, an expiry date, an issuer, a scope description, and a checksum.
The split made sense at the time. Blocks were editable prose. Evidence atoms were immutable artifacts. They had different lifecycles and different review workflows.
Why we merged them
Three problems:
- Cross-referencing required join tables. When a DDQ answer cited “our SOC 2 report,” we had a block, an evidence atom, and a linking row. Three writes, three reads, three places to get the relationship wrong. Our answer-provenance graph work exposed how messy the queries got under load.
- Freshness logic lived in two places. Block freshness (when was this last attested?) and evidence freshness (when does this artifact expire?) were computed separately. Reviewers saw two freshness indicators on the same answer and had to mentally reconcile them.
- Retrieval treated them differently. The retriever could surface blocks but not evidence; the DDQ module could surface evidence but didn’t participate in the general retrieval pipeline. A question that asked for both — “What’s your data-residency posture, and provide evidence” — required two separate queries stitched together in application code.
The merge
Block schema v3 adds a kind discriminator to the kb_blocks table: { "prose", "attestation", "artifact" }. All three kinds share the core fields: id, owner, freshness, version, tags, tenant. The kind-specific fields live in a kind_meta jsonb column.
- Prose blocks (the old KB blocks) have
kind_metawith authored text and reviewer fields. - Attestation blocks (the old evidence atoms) have
kind_metawith issuer, scope, expiry, checksum, and artifact URL. - Artifact blocks (new) are a middle ground — a named artifact (like a diagram or a policy PDF) that has prose commentary but isn’t itself authored prose.
Everything else — freshness checks, retrieval, citation rendering, versioning, permissioning — runs on the shared fields and doesn’t branch on kind.
What broke in the migration
Two things, worth flagging:
Tenant-scoped indexes. The old evidence_atoms table had a tenant-scoped unique index on (tenant_id, external_id) that didn’t exist in the new unified table. Six tenants had evidence atoms with duplicate external IDs that predated the constraint, and the migration script didn’t catch them. We discovered this when the backfill failed at 94%. Fix: de-dupe script, run before migration, with a report of which atoms get merged.
Retrieval recall on attestations. Attestations have short, structured text (“ISO 27001:2022, issued 2025-03-15, scope: all production systems”). The general retriever — tuned for prose blocks — didn’t rank them highly for queries like “ISO 27001 status.” We added a lightweight keyword boost for kind = attestation when the query contains a known certification name. Not elegant, and probably wrong long-term, but it works. The real fix is a separate attestation-specific retrieval head, which is on the roadmap.
What we got right
The discriminator pattern was the right call over separate tables. It let us keep application code that operates on “a KB thing” agnostic of kind — the proposal draft renderer, the citation UI, the permissions layer — while specializing where specialization matters (freshness rules, retrieval boosting).
The freshness unification is the biggest win. A reviewer now sees a single freshness indicator on an answer-plus-evidence pair: if either component is stale, the pair is stale. No mental reconciliation.
What’s next
The attestation retrieval head, mentioned above. And a related schema change for DDQ-specific constraints (same-tenant-only evidence attachment, required-evidence flags on specific block types) — that change is in design, not yet scheduled. The evidence-vault retrospective on Wednesday has more on what lives in the vault now and how expiry alerts catch the drift.