Author: cft
Date: 2023-07-16
License: CC0-1.0
URL: https://github.com/ssbc/sips/blob/master/XXX.md
BIPF (Binary In-Place Format) is a serialization format for JSON-like data types that is optimized for in-place memory access. It is used in some versions of Secure Scuttlebut (SSB), including tinySSB. This document describes BIPF as used in tinySSB.
SSB.classic uses human-readable JSON as representation format: append-only log entries are stored as JSON-encoding of the respective SSB data structure. Using JSON, which has no support for binary data, mandates that byte arrays are specially encoded (as ASCII strings according to BASE64), which further bloats a log entry. By choosing a binary representation as in BIPF, considerable space savings can be achieved and unnecessary transformations of the representation can be avoided. CBOR encoding also provides these features but was not considered because of the possibility of "indefinite lists encoding" which prevents the skipping of a list without parsing its elements. A link to a comprehensive comparison of serialization approaches can be found in the References section; see also the entry for BIPF.rationale.
In order to explain the difference between "JSON data types" and BIPF's "JSON-like data types" we provide two BNF grammers. The first shows a simplified grammar for JSON (that ommits low-level rules e.g. for integers or floating point numbers and strings):
JSON_VAL ::= JSON_ATOM / JSON_LIST / JSON_DICT
JSON_ATOM ::= 'true' / 'false' / 'null' / INT_VAL / DOUBLE_VAL / STR_VAL
JSON_LIST ::= '[' ']' / '[' JSON_VAL *(',' JSON_VAL) ']'
JSON_DICT ::= '{' '}' / '{' STR_VAL ':' JSON_VAL *(',' STR_VAL ':' JSON_VAL) '}'
BIPF is more expressive in that it also supports byte arrays. Moreover, any atom can be used as a key for a dictionary, not only strings. The following grammer for BIPF permits a direct comparison with the previous grammer for JSON:
BIPF_VAL ::= BIPF_ATOM / BIPF_LIST / BIPF_DICT
BIPF_ATOM ::= 'true' / 'false' / 'null' / INT_VAL / DOUBLE_VAL / STR_VAL / BYTES_VAL
BIPF_LIST ::= '[' ']' / '[' BIPF_VAL *(',' BIPF_VAL) ']'
BIPF_DICT ::= '{' '}' / '{' BIPF_ATOM ':' BIPF_VAL *(',' BIPF_ATOM ':' BIPF_VAL) '}'
This grammer can be used as a human-readable format of BIPF data items for documentation purposes. However, BIPF is a binary format wherefore integer values, for example, are not represented as digits drawn from the ASCII character set, but as a series of bytes that encode this value. The binary BIPF represenation of the value 123 will be the byte array #7b#, instead of "123" which as a string would translate to the bytes #313233#.
The Specification section below provides the complete list of encoding rules for the BIPF-supported data types.
data type Some abstract data type that we wish to encode such
that instanves of this data type can be transferred and
reconstructed by a remote peer. These are integers,
strings or lists, for example.
encoding rules A set or reversible mappings for turning potentially
structured in-memory data values into a flat series
of bytes that can be stored and/or transferred.
external representation, sometimes also called "transfer syntax"
A series of bytes that represent the (internal)
values of a data type. From that series of bytes, a
remote peer is able to create its own internal
value of the data.
serialization Turning an internal instance of a BIPF-supported data
type item into a series of bytes that can be transferred.
deserialization Turning a received BIPF series of bytes into an
internal replica of the original data item.
human readable A format expressed in ASCII characters and, if supported
by the rendering, of UTF8 unicode characters in strings.
This is in contrast with BIPF's binary representation where
encoded values cannot be easily parsed by humans.
For each of the 7 atomic and 2 structured types supported by BIPF, a bit pattern is assigned. These type identifiers are later used in the encoding of a corresponding value:
STRING : 0 (000) // utf8 encoded string
BYTES : 1 (001) // raw byte sequence
INT : 2 (010) // little endian, two's complement, minimal number of bytes
DOUBLE : 3 (011) // IEEE 754-encoded double precision floating point
LIST : 4 (100) // sequence of bipf-encoded values
DICT : 5 (101) // sequence of alternating bipf-encoded key and value
BOOLNULL: 6 (110) // 1 = true, 0 = false, no value means null
EXTENDED: 7 (111) // custom type. Specific type should be indicated by varint at start of buffer
Note that the BOOLNULL
bit pattern is used for three different
atomic types.
BIPF values are serialized with a TYPE-LENGTH-VALUE (TLV) encoding.
To this end, T and L are combined into a single integer value called
tag which is encoded with unsigned LEB128 (also called
varint
). The encoded tag is then prepended to the bytes of
the encoding of the value V proper, as shown in the following pseudo
code:
bipf_enc(V) ::= concat( tag(V.type,V.length), V.bytes )
tag(typ,len) ::= LEB128( len<<3 + typ )
LEB128 encoding (used for encoding the tag) is done by producing one encoded byte for each 7 input bit group, starting with the least significant group of 7 bits. All encoded bytes will have the most significant bit set, except the last byte. Zero is encoded as byte 0x00.
Integer values are encoded with the minimally required number of bytes in little-endian order using two's complement representation.
Lists are encoded by prepending to the concatenation of BIPF-encoded
elements a tag with typ=4
and a len
value which is the sum
of the lengths of the BIPF-encoded elements:
bipf([e1, e2]) = concat( tag(4,len(bipf(e1))+len(bipf(e2)))), bipf(e1), bipf(e2) )
A dictionary is encoded as if one would encode an alternating list of
key and value elements, except that the typ
value is DICT (5).
bipf({e1: e2}) = concat(tag(5,len(bipf(e1))+len(bipf(e2)))), bipf(e1), bipf(e2))
Following the JSON tradition, a BIPF library will typicall provide at least these two methods:
bipf_dumps(V) turns an in-memory data item V into a series of bytes
bipf_loads(S) turns a series of bytes S into the corresponding in-memory data item
One can expect BIPF libraries to have additional convenience methods like
bipf_predict_encoded_length(V)
etc.
(a) BIPF.original (see the References section) departs from this
document in the way integer values are encoded. In
https://github.com/ssbc/bipf-spec
,
the authors use a fixed-length encoding for integer values (4 bytes,
little endian, two's complement). With space concerns in mind, tinySSB
formats integers as little endian, two's complement, and retaining only
the minimum number of bytes needed.
(b) As pointed out in the Motivation section, a straight-forward human-readable representation of BIPF exists that is very close to JSON. The only syntactic extension needed is a delimiter for byte arrays where we chose the '#' hash sign as used in S-expressions (which is another serialization format from 1997).
(c) We provide the following test vectors where for each human-readable BIPF-encoding we give its binary representation.
data item as human- its BIPF serialization
readable BIPF as byte array
------------------- ------------------------
null 06
false 0e00
true 0e01
123 0a7b
-123 0a85
"¥€$!" 39c2a5e282ac2421
#ABCD# 11abcd
[123,true] 240a7b0e01
{123:false} 250a7b0e00
{#ABCD#:[123,null]} 3d11abcd1c0a7b06
unsigned LEB128 (varint): https://en.wikipedia.org/wiki/LEB128
,
BASE64: https://www.rfc-editor.org/rfc/rfc4648
BIPF.original: https://github.com/ssbc/bipf-spec
BIPF.rationale: https://github.com/ssbc/bipf
BNF: Section 2 of https://www.rfc-editor.org/rfc/rfc822
CBOR: https://www.rfc-editor.org/rfc/rfc8949
Comparison: https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats
JSON: https://www.rfc-editor.org/rfc/rfc8259
S-exp: https://web.archive.org/web/20230223024606/http://people.csail.mit.edu/rivest/Sexp.txt
SSB.classic: https://ssbc.github.io/scuttlebutt-protocol-guide/
UTF8: https://www.rfc-editor.org/rfc/rfc3629
varint in ProtoBuf: https://protobuf.dev/programming-guides/encoding/#varints
BIPF.tinySSB:
- Python: https://pypi.org/project/bipf/
BIPF.classic:
- Go: https://git.sr.ht/~cryptix/go-exp/tree/bipf/item/bipf
- JS: https://github.com/ssbc/bipf/
- Nim: https://github.com/BundleFeed/nim_bipf (npm package with JS compiled version: https://www.npmjs.com/package/nim_bipf)
- Rust: https://github.com/jerive/bipf-rs