Wire format

Now that we have immediates and shallow values, let's see how they are encoded.

This part is where Twine is relatively unique. The closest (to my knowledge) to Twine's data model is Fleece, and Fleece makes different design choices to make different tradeoffs — see why not Fleece for more details —, but the way Twine encodes individual values is closest to CBOR.

Values and immediates in Twine always start with a header byte. The header byte is divided into high and low nibbles (most significant and least significant). The high nibble is the kind of the value; the low nibble is a small integer values with kind-dependent meaning:

  • low is a small integer. If low=15, the header byte is followed by a LEB128 value low_rest, and the resulting integer is low_rest + 15
  • low is exactly as above, with optional LEB128 followup, but represents a length or number of items
  • low is a small integer that represents a small enumeration of values. Each value has a specific meaning.

From now on we refer to these are high (4 bits), and n for the small integer obtained from low and the optional following LEB128 integer.

Here are the possible kinds:

kindmeaning of lowvalue
0enumspecial scalars: true, false, null
1small intnon-negative integer
2small intnegative integer
3enumfloat
4lengthUTF-8 text string (length is number of bytes)
5lengthbyte string (length is number of bytes)
6lengtharray (length is number of items)
7lengthmap (length is number of key/value pairs)
8small inttag (offset follows)
9-reserved
10small intvariant0 (integer is the variant index)
11small intvariant1 (integer is the variant index)
12small intvariantN (integer is the variant index)
13-reserved
14small intreference (integer is the offset delta)
15small intpointer (integer is the offset delta)

Kind=0: special scalars

Layout in bytes:

┌─────────┐
│offset: 0│
├─────────┤
│┌─┬───┐  │
││0│low│  │
│└─┴───┘  │
└─────────┘

These are some immediate values, using this table:

highlowvalue
00false
01true
02null
0_reserved

Kind=1: non-negative integer.

Layout in bytes:

┌─────────┬───────────────────────┐
│offset: 0│offset: 1…n1           │
├─────────┼───────────────────────┤
│┌─┬───┐  │LEB128 length extension│
││1│low│  │                       │
│└─┴───┘  │                       │
└─────────┴───────────────────────┘

The value is the integer n. Only values in [0…2^63-1] should be encoded this way, so that they fit in i64.

For example, the integer 42 is encoded as:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 1f 1b                   ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘

Kind=2: negative integer

Layout in bytes:

┌─────────┬───────────────────────┐
│offset: 0│offset: 1…n1           │
├─────────┼───────────────────────┤
│┌─┬───┐  │LEB128 length extension│
││2│low│  │                       │
│└─┴───┘  │                       │
└─────────┴───────────────────────┘

The value, given n, is the signed integer -n-1. Only values in [-2^63…-1] should be encoded this way, so they fit in a i64.

For example, -2 is:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 21                      ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘

and -27 is:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 2f 0b                   ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘

Note that the value we encode is 26, which overflows 15; so we write low=15 and then 26-15 = 11 = 0x0b as a single byte in LEB128.

Kind=3: float

Layout in bytes:

┌─────────┬─────────────────┐
│offset: 0│offset: 1..5     │
├─────────┼─────────────────┤
│┌─┬─┐    │f32 little endian│
││3│0│    │                 │
│└─┴─┘    │                 │
└─────────┴─────────────────┘

┌─────────┬─────────────────┐
│offset: 0│offset: 1..9     │
├─────────┼─────────────────┤
│┌─┬─┐    │f64 little endian│
││3│1│    │                 │
│└─┴─┘    │                 │
└─────────┴─────────────────┘

The value is a 32 or 64 bits float, encoded as its bits in little endian. If low = 0, the precision is 32 bits; if low = 1, the precision is 64 bits. Other values of low are reserved.

For example, 42.5 : f64 is represented as:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 31 00 00 00 00 00 40 45 ┊ 40                      │
└────────┴─────────────────────────┴─────────────────────────┘

This corresponds to 0x4045400000000000 in little endian, as per https://float.exposed/0x4045400000000000 .

Kind=4: UTF-8 text

Byte layout:

┌─────────┬───────────────────────┬───────────────────┐
│offset: 0│offset: 1…n1           │offset: n1+1…n1+len│
├─────────┼───────────────────────┼───────────────────┤
│┌─┬───┐  │LEB128 length extension│string data        │
││4│low│  │                       │                   │
│└─┴───┘  │                       │                   │
└─────────┴───────────────────────┴───────────────────┘

For example, "hello world! 😁" becomes:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 4f 02 68 65 6c 6c 6f 20 ┊ 77 6f 72 6c 64 21 20 f0 │
│00000010│ 9f 98 81                ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘

Note how the header+length is 0x4f02 because the length in bytes is greater than 15.

Kind=5: byte string

Byte layout:

┌─────────┬───────────────────────┬───────────────────┐
│offset: 0│offset: 1…n1           │offset: n1+1…n1+len│
├─────────┼───────────────────────┼───────────────────┤
│┌─┬───┐  │LEB128 length extension│bytes              │
││5│low│  │                       │                   │
│└─┴───┘  │                       │                   │
└─────────┴───────────────────────┴───────────────────┘

This is like text, but the content is an opaque blob with not constraints.

Kind=6: array

Byte layout:

┌─────────┬───────────────────────┬──────────┬──────────┬─────────┬────────────┐
│offset: 0│offset: 1…n1           │offset: … │offset: … │offset: …│offset: …   │
├─────────┼───────────────────────┼──────────┼──────────┼─────────┼────────────┤
│┌─┬───┐  │LEB128 length extension│┌────────┐│┌────────┐│…        │┌──────────┐│
││6│low│  │                       ││array[0]│││array[1]││         ││array[n-1]││
│└─┴───┘  │                       │└────────┘│└────────┘│         │└──────────┘│
└─────────┴───────────────────────┴──────────┴──────────┴─────────┴────────────┘

The number of items is n, and items just follow. The important restriction here is that items must be immediate values, so that iteration remains fast; skipping over an immediate value can be done by reading the header and deducing how many additional bytes need to be skipped.

For example, [[42], 1, 2, 3] is represented as:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 61 1f 1b 64 f3 11 12 13 ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘

Note here that the sub-array [42] is 0x611f1b (length=1, followed by 0x1f1b), and then the main array starts at offset=3 with header 0x64.

Kind=7: map

Byte layout:

┌─────────┬───────────────────────┬─────────┬──────────┬─────────┬──────────┬────────────┐
│offset: 0│offset: 1…n1           │offset: …│offset: … │offset: …│offset: … │offset: …   │
├─────────┼───────────────────────┼─────────┼──────────┼─────────┼──────────┼────────────┤
│┌─┬───┐  │LEB128 length extension│┌──────┐ │┌────────┐│…        │┌────────┐│┌──────────┐│
││6│low│  │                       ││key[0]│ ││value[0]││         ││key[n-1]│││value[n-1]││
│└─┴───┘  │                       │└──────┘ │└────────┘│         │└────────┘│└──────────┘│
└─────────┴───────────────────────┴─────────┴──────────┴─────────┴──────────┴────────────┘

Like arrays, maps use n to encode the number of key/value pairs.

For example, {"a": 42, "b": false} is represented as:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 72 41 61 1f 1b 41 62 00 ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘

We can see the keys 0x4161 and 0x4162. Note that in general, keys don't have to be strings.

Kind=8: tag

Byte layout:

┌─────────┬───────────────────────┬─────────────┐
│offset: 0│offset: 1…n1           │offset: n1+1…│
├─────────┼───────────────────────┼─────────────┤
│┌─┬───┐  │LEB128 length extension│┌─────┐      │
││6│low│  │                       ││value│      │
│└─┴───┘  │                       │└─────┘      │
└─────────┴───────────────────────┴─────────────┘

The tag is n, and the value follows immediately as an immediate. Tagging an array or map or other tag requires the value to be written first, and referred to using a pointer.

Kind=10,11,12: variant

Byte layout:

┌─────────┬───────────────────────┐
│offset: 0│offset: 1…n1           │
├─────────┼───────────────────────┤
│┌──┬───┐ │LEB128 length extension│
││10│low│ │                       │
│└──┴───┘ │                       │
└─────────┴───────────────────────┘

┌─────────┬───────────────────────┬─────────────┐
│offset: 0│offset: 1…n1           │offset: n1+1…│
├─────────┼───────────────────────┼─────────────┤
│┌──┬───┐ │LEB128 length extension│┌─────┐      │
││11│low│ │                       ││value│      │
│└──┴───┘ │                       │└─────┘      │
└─────────┴───────────────────────┴─────────────┘

┌─────────┬───────────────────────┬───────────────┬─────────┬─────────┬─────────┬───────────┐
│offset: 0│offset: 1…n1           │offset: n1+1…n2│offset: …│offset: …│offset: …│offset: …  │
├─────────┼───────────────────────┼───────────────┼─────────┼─────────┼─────────┼───────────┤
│┌──┬───┐ │LEB128 length extension│LEB128         │┌───────┐│┌───────┐│…        │┌─────────┐│
││12│low│ │                       │               ││args[0]│││args[1]││         ││args[n-1]││
│└──┴───┘ │                       │               │└───────┘│└───────┘│         │└─────────┘│
└─────────┴───────────────────────┴───────────────┴─────────┴─────────┴─────────┴───────────┘

Here we have multiple cases:

  • high=10: just like a non-negative integer, but n` is the variant index
  • high=11: just like a tag, but n is the variant index. The single argument must be an immediate.
  • high=12: the variant index is n, and is followed by a LEB128 number indicating how many arguments follow immediately. The arguments must be immediates.

Kind=14: reference

Layout in bytes:

┌─────────┬───────────────────────┐
│offset: 0│offset: 1…n1           │
├─────────┼───────────────────────┤
│┌──┬───┐ │LEB128 length extension│
││14│low│ │                       │
│└──┴───┘ │                       │
└─────────┴───────────────────────┘

The reference is, logically, an offset (coming strictly earlier in the byte stream). If the reference is at offset p, then the offset it points to is p-n-1.

For example, the value 42 followed by a reference to it is encoded as:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 1f 1b e1                ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘

Kind=15: pointer

Layout in bytes:

┌─────────┬───────────────────────┐
│offset: 0│offset: 1…n1           │
├─────────┼───────────────────────┤
│┌──┬───┐ │LEB128 length extension│
││15│low│ │                       │
│└──┴───┘ │                       │
└─────────┴───────────────────────┘

This is like a reference, but it's implicitly followed.

For example, the value 42 followed by a pointer to it is encoded as:

┌────────┬─────────────────────────┬─────────────────────────┐
│00000000│ 1f 1b f1                ┊                         │
└────────┴─────────────────────────┴─────────────────────────┘