A p2p merklized database index.
Okra is a key/value store augmented with a merkle tree index. You can use it like a regular key/value store, with get
/set
/delete
methods and an entries()
iterator. The merkle tree enables efficient syncing by iterating over the missing, extra, or conflicting entries between a local and remote Okra database.
npm i @canvas-js/okra @canvas-js/okra-lmdb @canvas-js/okra-memory
The core package @canvas-js/okra
has a generic tree that must be provided with a backing store.
The @canvas-js/okra-lmdb
and @canvas-js/okra-memory
instantiate the generic pure-JS tree with a persistent LMDB database and an in-memory red/black tree, respectively.
import { Tree } from "@canvas-js/okra-lmdb"
const tree = new Tree("path/to/db")
try {
// ...
} finally {
tree.close()
}
import { Tree } from "@canvas-js/okra-memory"
const tree = new Tree()
// ...
Okra is designed as a thin wrapper around an abstract key/value store interface. The details of how Okra stores the merkle tree nodes in the underlying key/value store are documented in the Zig implementation repo.
The examples here all use @canvas-js/okra-memory
for simplicity.
⚠️ Never try to directly access or mutate the entries of the underlying key/value store. Only interact with the store through theTree.read(...)
andTree.write(...)
transaction interfaces.
The tree can be used as a normal key/value store with .get
, .set
, and .delete
methods.
import { fromString, toString } from "uint8arrays"
import { Tree } from "@canvas-js/okra-memory"
const tree = new Tree()
await tree.write((txn) => {
tree.set(fromString("a"), fromString("foo"))
tree.set(fromString("b"), fromString("bar"))
tree.set(fromString("c"), fromString("baz"))
})
await tree.read((txn) => {
const value = txn.get(fromString("a"))
console.log(toString(value)) // "foo"
})
await tree.write((txn) => {
tree.delete(toString("a"))
})
await tree.read((txn) => {
const value = tree.get(encoder.encode("a"))
console.log(value) // null
})
You can iterate over ranges of entries with txn.entries()
.
await tree.read((txn) => {
console.log([...txn.entries()])
// [
// [ <Buffer 61>, <Buffer 66 6f 6f> ],
// [ <Buffer 62>, <Buffer 62 61 72> ],
// [ <Buffer 63>, <Buffer 62 61 7a> ]
// ]
console.log([...txn.entries({ reverse: true })])
// [
// [ <Buffer 63>, <Buffer 62 61 7a> ],
// [ <Buffer 62>, <Buffer 62 61 72> ],
// [ <Buffer 61>, <Buffer 66 6f 6f> ]
// ]
console.log([
...txn.entries({
lowerBound: { key: fromString("b"), inclusive: true },
upperBound: { key: fromString("c"), inclusive: false },
}),
])
// [
// [ <Buffer 62>, <Buffer 62 61 72> ]
// ]
})
You can access the internal merkle tree nodes using the getRoot
, getNode
, and getChildren
methods. These are the methods that must be accessible to other Okra databases, such as over a WebSocket connection. okra-js itself is transport-agnostic.
await tree.read((txn) => txn.getRoot())
// { level: 1, key: null, hash: <Buffer 62 46 b9 40 74 d0 9f eb 64 4b e1 a1 c1 2c 1f 50> }
await tree.read((txn) => txn.getChildren(1, null))
// [
// {
// level: 0,
// key: null,
// hash: <Buffer af 13 49 b9 f5 f9 a1 a6 a0 40 4d ea 36 dc c9 49>
// },
// {
// level: 0,
// key: <Buffer 61>,
// hash: <Buffer 2f 26 b8 5f 65 eb 9f 7a 8a c1 1e 79 e7 10 14 8d>,
// value: <Buffer 66 6f 6f>
// },
// {
// level: 0,
// key: <Buffer 62>,
// hash: <Buffer 68 4f 10 47 a1 78 e6 cf 9f ff 75 9b a1 ed ec 2d>,
// value: <Buffer 62 61 72>
// },
// {
// level: 0,
// key: <Buffer 63>,
// hash: <Buffer 56 cb 13 c7 88 23 52 5b 08 d4 71 b6 c1 20 13 60>,
// value: <Buffer 62 61 7a>
// }
// ]
Note that the leaf anchor node at (level = 0, key = null)
doesn't have a value, but the rest of the leaf nodes do, since they represent entries in the external key/value store.
First, you must have an instance of the SyncSource
interface for the remote Okra database you want to sync with. If you're exposing the merkle tree via an HTTP API, you'll have to write a little client implementing SyncSource
that uses fetch
, or whatever is appropriate for your transport.
export interface SyncSource {
getRoot(): Promise<Node>
getNode(level: number, key: Key): Promise<Node | null>
getChildren(level: number, key: Key): Promise<Node[]>
}
Tree
itself implements SyncSource
, so we can easily demonstrate the sync methods using two local databases.
import { sha256 } from "@noble/hashes/sha256"
import { Tree, sync, collect } from "@canvas-js/okra"
import { Tree } from "@canvas-js/okra-memory"
// create two in-memory trees
const source = new Tree()
const target = new Tree()
// initialize them with the same 256 random entries
{
function init(txn) {
for (let i = 0; i < 256; i++) {
const key = new Uint8Array([i])
txn.set(key, sha256(key))
}
}
await source.write(init)
await target.write(init)
}
// delete one entry from source
await source.write((txn) => txndelete(new Uint8Array([0x21])))
// delete one entry from target
await target.delete((txn) => txn.delete(new Uint8Array([0x44])))
// set conflicting values for another entry
await source.write((txn) => txn.set(new Uint8Array([0x04]), fromString("foo")))
await target.write((txn) => txn.set(new Uint8Array([0x04]), fromString("bar")))
for await (const delta of sync(source, target)) {
console.log(delta)
}
// {
// key: <Buffer 04>,
// source: <Buffer 66 6f 6f>,
// target: <Buffer 62 61 72>
// }
// {
// key: <Buffer 21>,
// source: null,
// target: <Buffer bb 72 08 bc 9b 5d 7c 04 f1 23 6a 82 a0 09 3a 5e 33 f4 04 23 d5 ba 8d 42 66 f7 09 2c 3b a4 3b 62>
// }
// {
// key: <Buffer 44>,
// source: <Buffer 3f 39 d5 c3 48 e5 b7 9d 06 e8 42 c1 14 e6 cc 57 15 83 bb f4 4e 4b 0e bf da 1a 01 ec 05 74 5d 43>,
// target: null
// }
The sync
export takes a SyncSource
and SyncTarget
and returns an async generator that yields delta: Delta
objects with key
, source
, and target
properties. delta.key
is always a Uint8Array
. delta.source === null && delta.target !== null
represents an entry that the target has but the source is missing, delta.source !== null && delta.target === null
represents an entry that the source has but the target is missing, and delta.source !== null && delta.target !== null
represents an entry for which the source and target have different values. delta.source
and delta.target
are never both null
.
Calling sync(source, target)
does not automatically modify target
- it only iterates over the differences. Taking action in response to each delta is up to you! Here are three examples of using sync
.
import { sync } from "@canvas-js/okra"
async function pull(source: SyncSource, target: SyncTarget): Promise<void> {
for await (const delta of sync(source, target)) {
if (delta.source === null) {
continue
} else if (delta.target === null) {
await target.set(delta.key, delta.source)
// do more stuff to process the new value
// ...
} else {
throw new Error(`Conflict at key ${hex(delta.key)}`)
}
}
}
Call await pull(source, tree)
to pull in missing entries that source
has but tree
doesn't, doing nothing for entries in tree
missing in source
, and throwing an error if the values for a key conflict. Only use this if the values of keys don't change! One example where this is useful is in content-addressed systems where keys are derived from the hashes of immutable values.
One way of looking at pull
is that it implements a grow-only set with an efficient union operation. This pattern is useful, for example, as a persistence layer for operation-based CRDT systems, where it allows arbitrary peers to directly sync their sets of operations without relying on tracking methods that scale linearly in the number of peers.
import { sync } from "@canvas-js/okra"
async function copy(source: SyncSource, target: SyncTarget): Promise<void> {
for await (const delta of sync(source, target)) {
if (delta.source === null) {
await target.delete(delta.key)
} else {
await target.set(delta.key, delta.source)
}
}
}
Call copy(source, tree)
to copy the remote source, deleting any local entries that source
doesn't have, setting new entries for keys that source
has that tree
doesn't, and adopting source
's value for any conflicting keys. By the end, tree
will have the exact same leaf entries, tree structure, and root hash as source
.
import { sync } from "@canvas-js/okra"
async function merge(
source: SyncSource,
target: SyncTarget,
resolve: (key: Uint8Array, source: Uint8Array, target: Uint8Array) => Uint8Array | Promise<Uint8Array>
): Promise<void> {
for await (const delta of sync(source, target)) {
if (delta.source === null) {
continue
} else if (delta.target === null) {
await target.set(delta.key, delta.source)
} else {
const value = await resolve(delta.key, delta.source, delta.target)
await target.set(delta.key, value)
}
}
}
Call merge(source, tree, resolve)
to merge entries from a remote source - keeping local entries that source
doesn't have, copying entries from source
that aren't present in tree
, and resolving conflicting values using the provided resolve
method.
In most cases, the resolve
method should deterministically choose one of the two values, but it could also return a new "merged" value. The only hard constraints are that it must be commutative, associative, and idempotent.
- commutativity:
resolve(A, B) == resolve(B, A)
for allA
andB
- associativity:
resolve(A, resolve(B, C)) == resolve(resolve(A, B), C)
for allA
,B
, andC
- idempotence:
resolve(A, A) == A
for allA
The merge method is useful for implementing persistent state-based CRDT systems. If your top-level state and its global merge function can be represented as a key/value map with entry-wise merging, you can use Okra to perform p2p state merges in logarithmic time.
okra-js uses the debug
package for logging. In NodeJS, you can turn on logging to stdout by setting a DEBUG=okra:*
environment variable. In the browser, you can turn on console logging by setting localStorage.debug = 'okra:*'
.
Also useful is the tree.print()
method, which pretty-prints the merkle tree structure to a utf-8 AsyncIterableIterator<Uint8Array>
stream. In NodeJS, you can pipe this directly to stdout or consume the entire output with the text
utility method from stream/consumers
:
import { text } from "node:stream/consumers"
import { sha256 } from "@noble/hashes/sha256"
import { printTree } from "@canvas-js/okra"
import { Tree } from "@canvas-js/okra-memory"
const tree = new Tree()
await tree.write(async (txn) => {
for (let i = 0; i < 256; i++) {
const key = new Uint8Array([i])
tree.set(key, sha256(key))
}
console.log(await text(printTree(txn)))
// ── 10d7126a ┬─ a74fc67d ┬─ af1349b9 |
// │ ├─ 83a4849c | 00
// │ ├─ d789a084 | 01
// │ ├─ cf069554 | 02
// │ ├─ 30c20bbc | 03
// │ ├─ ad5563ad | 04
// │ ├─ d8a59831 | 05
// │ └─ 8c7fe3e1 | 06
// ├─ 9bc31466 ┬─ 063bdfac | 07
// ...
})
Entry values are not printed; the rightmost column is the list of keys of the leaf entries in hex.
Tests are run with AVA and live in ./test/*.test.ts.
npm run test
The two most important things covered by the tests are 1) correctness of the tree update algorithm 2) correctness of the syncing algorithm. Correctness of the tree is tested by comparing the underlying database entry-by-entry with a reference tree built layer-by-layer using the Builder
class exported from ./src/builder.ts (also used by Tree.prototype.rebuild
). These tests insert entries in random order in a series of sizes up to 10000 entries, using Q = 4
to maximize tree height and internal complexity. Correctness of syncing is tested by initializing two trees with the same contents, also with Q = 4
, then randomly deleting different sets of entries from each of them, manually tracking the expected set of deltas and testing that t.deepEqual(await collect(sync(source, target)), expected)
.
export type Key = Uint8Array | null
// value is undefined for level > 0 || key === null,
// and a Uint8Array for level === 0 && key !== null.
export type Node = {
level: number
key: Key
hash: Uint8Array
value?: Uint8Array
}
// source and target are never both null.
export type Delta = { key: Uint8Array; source: Uint8Array | null; target: Uint8Array | null }
export type Entry = [key: Uint8Array, value: Uint8Array]
export type Bound<K> = { key: K; inclusive: boolean }
export interface SyncSource {
getRoot(): Promise<Node>
getNode(level: number, key: Key): Promise<Node | null>
getChildren(level: number, key: Key): Promise<Node[]>
}
export interface SyncTarget extends SyncSource {
nodes(
level: number,
lowerBound?: Bound<Key> | null,
upperBound?: Bound<Key> | null,
options?: { reverse?: boolean }
): AsyncIterableIterator<Node>
}
export interface IReadOnlyTransaction extends SyncSource, SyncTarget {
has(key: Uint8Array): boolean
get(key: Uint8Array): Uint8Array | null
keys(
lowerBound?: Bound<Uint8Array> | null,
upperBound?: Bound<Uint8Array> | null,
options?: { reverse?: boolean }
): IterableIterator<Uint8Array>
entries(
lowerBound?: Bound<Uint8Array> | null,
upperBound?: Bound<Uint8Array> | null,
options?: { reverse?: boolean }
): IterableIterator<Entry>
}
export interface IReadWriteTransaction extends IReadOnlyTransaction {
set(key: Uint8Array, value: Uint8Array): void
delete(key: Uint8Array): void
}
export interface ITree {
metadata: Metadata
read<T>(callback: (txn: IReadOnlyTransaction) => Awaitable<T>): Promise<T>
write<T>(callback: (txn: IReadWriteTransaction) => Awaitable<T>): Promise<T>
}
This is an early project that is stil best treated as a research prototype. Feel free to open an issue if you find bugs or have thoughts about API design.
MIT © 2023 Canvas Technologies, Inc.