Maryk + FoundationDB: Storage Layout
This document explains how Maryk stores data in FoundationDB (FDB): what subspaces are used, how keys and values are encoded, and how versioning, soft deletes, uniques, and indexes work. It is meant to be approachable for contributors and readers new to both Maryk and FDB.
- Every DataModel gets its own FDB directory (subspace) tree under a configurable root.
- We keep small, well‑known subspaces per model:
keys,table,unique,index. If history is enabled, we also have*_versionedvariants. - “Latest” values live in
tableand are encoded as(version || value). History (if enabled) lives in separate subspaces keyed by an inverted version suffix to keep “latest first” ordering. - Soft deletes are object‑level flags stored as a special qualifier. Hard deletes clear all records (and history) for a key.
- Uniques and Indexes are first‑class citizens with current and (optionally) historic representations.
Store metadata directory
Section titled “Store metadata directory”In addition to the per-model directories, the store reserves a shared directory at ['__meta__', 'models_by_id']. It keeps a simple map from model id to model name so the datastore can verify that the configured dataModelsById matches what was previously persisted before touching any model data. Entries are stored as:
- Key: directory prefix + 4-byte big-endian model id.
- Value: UTF-8 encoded model name.
The dedicated directory already isolates this metadata, so no extra prefix byte is required.
Directory (Subspace) Layout
Section titled “Directory (Subspace) Layout”Per DataModel (e.g. Log, Person) we create these subspaces:
meta– stored schema and migration state for the DataModel.keys– existence + creation version per object key.table– latest property values per object key, plus a small “latest version” marker per key.unique– latest unique constraints (per unique property or composite).index– latest secondary indexes.
If keepAllVersions = true, we also create historic subspaces:
table_versioned– all historical values per object key.unique_versioned– historic unique entries (tombstones and key snapshots).index_versioned– historic index entries (tombstones and value snapshots).
All of the above are regular FDB subspaces created via the DirectoryLayer. They give us prefix isolation so we can read/write/scan per model efficiently.
Keys and Qualifiers
Section titled “Keys and Qualifiers”Maryk uses a “row/column” style: the full FDB key is the subspace prefix + the Maryk key + a “qualifier” representing a property (or property+collection item).
Examples (pseudocode):
- Latest value for property:
tablePrefix + key + propertyQualifier→(version || value) - Object soft‑delete flag:
tablePrefix + key + [0x00]→(version || [0x01]) - Creation timestamp:
keysPrefix + key→version(no value payload) - Latest version marker:
tablePrefix + key→version(no qualifier; used to derive lastVersion)
The “qualifier” is generated from Maryk’s property references. Collections (list/set/map) qualify by index or element key. Embedded/object markers use small type markers.
Value Encoding
Section titled “Value Encoding”- Latest values in
table: stored as(version || value)versionis Maryk’s HLC timestamp (8 bytes, big‑endian). We use it for concurrency control, “last write wins”, and to exposefirstVersion/lastVersionto clients.
- Historic values in
table_versioned: stored on a separate key with inverted version bytes in the key suffix:- Key:
historicTablePrefix + key + encodeZeroFree(qualifier) + 0x00 + inverted(version) - Value: just
value(no version prefix, since it is already in the key) - Inverting bytes for the version makes newer versions sort before older versions lexicographically, so a forward range scan gives “latest first”.
- Key:
Maryk’s value serialization is reused across storage engines. Simple types are written directly, and for wrappers (e.g. enums, typed values) the inner type bytes are composed accordingly.
Versioning
Section titled “Versioning”FoundationDB write transactions always write a new version (HLC timestamp) for any changed value(s). For keepAllVersions = true we mirror the write into the historic subspace using the inverted version in the key. Readers can then:
- Read the latest value from
table. - Read to a
toVersionby scanning the historic subspace only up to the invertedtoVersion(first match = latest ≤toVersion).
On the historic/versioned tables the version is encoded in the key suffix with inverted bytes, so forward range scans yield latest‑first. Qualifiers are encoded to contain no 0x00 bytes. A single 0x00 separator is inserted between the (encoded) qualifier and the inverted version. This guarantees the separator is the first zero byte and preserves correct lexicographic ordering without extra buffering during scans.
Soft Delete vs Hard Delete
Section titled “Soft Delete vs Hard Delete”- Soft delete sets the special “object delete” qualifier on the
table(and historic if enabled), recording(version || true). - Hard delete clears:
keysPrefix + key- All
tablePrefix + key + ... - All
table_versioned + key + ...(if present) - And prunes related unique/index entries.
Consumers can request filterSoftDeleted = true and Maryk will transparently hide soft‑deleted objects.
Uniques
Section titled “Uniques”Uniques are stored in unique as:
- Key:
uniquePrefix + (uniqueRef || valueBytes) - Value:
(version || keyBytes)
On insertion, we first read to check if a value already exists, and fail if the unique is taken. On update/delete we remove the unique entry (and, if historic is enabled, write a historic “tombstone” or snapshot into unique_versioned).
Indexes
Section titled “Indexes”Indexes are stored in index as:
- Key:
indexPrefix + indexRef + (indexValueBytes || keyBytes) - Value (latest):
version - Historic:
index_versioned + encodeZeroFree(indexRef || (indexValueBytes || keyBytes)) + 0x00 + inverted(version)→ entries record index changes. Historic index scans read these entries within the computed range and map back to keys for ordering and filtering at a giventoVersion.
This design enables:
- Efficient scans in index order with or without a starting key.
- Partial prefix scans (e.g. “all rows for this severity”).
- Historic scans (if enabled) to see which keys matched an index at a given time. Historic index scanning is supported and used when
toVersionis provided.
Get, Scan, and Changes
Section titled “Get, Scan, and Changes”- Get by key: check
keysfor existence and creation version, apply filters (including soft delete), then read values out oftableortable_versioneddepending ontoVersion. - Scan by key: compute key ranges from the model and filters, walk
keysin ASC or DESC, apply filters, and collect up tolimit. - Scan by index: build index ranges from the filter and order, scan
index(value+key) and map back to primary keys. WhentoVersionis provided, perform a historic index scan by computing the index value at that version per key and applying the same ordering and range filters. - Changes APIs (GetChanges/ScanChanges): instead of returning full values, we stream
VersionedChanges(creation + field changes) betweenfromVersion..toVersion, withmaxVersionslimiting per field.
Filtering
Section titled “Filtering”Maryk’s filter DSL (Exists, Equals, Range, Regex, etc.) is evaluated by matching qualifiers to property references:
- For exact property references we do a direct get on
tableor scantable_versionedup totoVersion. - For “fuzzy” references (like any element in a list/map) we scan the qualifier space under the property prefix.
Soft delete filtering is layered in: if filterSoftDeleted is true, we check the soft delete indicator and hide those rows.
Error Handling & Validations
Section titled “Error Handling & Validations”All writes occur in a single FDB transaction per request. We do read‑for‑write validations (uniques, parent existence for nested values) inside the same transaction to avoid races. The transaction is retried by FDB on conflicts, and we propagate Maryk validation errors (e.g. unique violation) back to clients as structured responses.
Why use separate historic subspaces?
Section titled “Why use separate historic subspaces?”Keeping current data in table and history in table_versioned gives us:
- small and fast lookups for “latest” queries;
- predictable scans for “as of” queries without mixing current and historic writes;
- forward‑only streaming (because of inverted version suffix) for time windows.
This maps neatly to FoundationDB’s range read strengths, while preserving Maryk’s model semantics.