Skip to content

Writing Bops: The Bebop Schema Language

Andrew Sampson edited this page Dec 2, 2020 · 15 revisions

Bebop schemas are written in the custom Bebop Schema Language, which this page documents.

Definition syntax

A Bebop schema consists of a series of definitions, each introduced by a keyword, followed by a name and a description in curly braces. Here is an example schema, demonstrating the three kinds of definition Bebop accepts:

enum Instrument {
    Sax = 0;
    Trumpet = 1;
    Clarinet = 2;
}

struct Performer {
    string name;
    Instrument plays;
}

message Song {
    string title = 1;
    uint16 year = 2;
    Performer[] performers = 3;
}

Let's go over each of these:

Enum

An enum defines a type that acts as a wrapper around uint32, with certain named constants, each having a corresponding underlying integer value. It is used much like an enum in C.

The syntax is: enum Flavor { Vanilla = 1; Chocolate = 2; Mint = 3; }.

  • Unlike in C, all constants must be explicitly given an integer literal value.

  • You should never remove a constant from an enum definition. Instead, put [deprecated("reason here")] in front of the name.

  • You're free to add new constants to an enum at any point in the future.

Struct

A struct defines an aggregation of "fields", containing typed values in a fixed order. All values are always present. It is used much like a struct in C.

The syntax is: struct Point { int32 x; int32 y; }.

  • The binary representation of a struct is simply that of all field values in order.
    This means it's more compact and efficient than message.

  • When you define a struct, you're promising to never add or remove fields from it.
    (If this turns out to be necessary, you'll have to define a struct MyStructV2 and deprecate the old struct MyStruct.)

  • When you define a struct with the readonly modifier the Bebop compiler guarantees that it's values cannot be modified or updated after decoding takes place. Use this to ensure data integrity when marshalling between language domains.

Message

A message defines an indexed aggregation of fields containing typed values, each of which may be absent. It might correspond to something like a class in Java, or a JSON object.

The syntax is: message Song { string title = 1; uint16 year = 2; } — note the indices.

  • In the binary representation of a message, the message is prefixed with its length, and each field is prefixed with its index.

  • It's okay to add fields to a message with new indices later — in fact, this is the whole point of message. (When an unrecognized field index is encountered in the process of decoding a message, it is skipped over. This allows for compatibility with versions of your app that use an older version of the schema.)

Notes

When talking about Bebop, the word "record" is used to mean "either a struct or a message".

In any definition, ; is used to delimit items. In a record definition, each field is specified by giving the name of the type of the field, followed by the name of the field, followed by ;.

Types

The following types are built-ins:

Name Description
bool A Boolean value, true or false.
byte An unsigned 8-bit integer.
uint16 An unsigned 16-bit integer.
int16 A signed 16-bit integer.
uint32 An unsigned 32-bit integer.
int32 A signed 32-bit integer.
uint64 An unsigned 64-bit integer.
int64 A signed 64-bit integer.
float32 A 32-bit IEEE single-precision floating point number.
float64 A 64-bit IEEE double-precision floating point number.
string A length-prefixed UTF-8-encoded string.
guid A GUID.
date A UTC date / timestamp.
T[] A length-prefixed array of T values. array[T] is an alias.
map[T1, T2] A map, as a length-prefixed array of (T1, T2) association pairs.

You may also use user-defined types (enums and other records) as field types.

A string is stored as a length-prefixed array of bytes. All length-prefixes are 32-bit unsigned integers, which means the maximum number of bytes in a string, or entries in an array or map, is about 4 billion (2^32).

A guid is stored as 16 bytes, in Guid.ToByteArray order.

A date is stored as a 64-bit integer amount of “ticks” since 00:00:00 UTC on January 1 of year 1 A.D. in the Gregorian calendar, where a “tick” is 100 nanoseconds.

Annotations

The “deprecated” annotation

Use [deprecated("We no longer use this")] before a field. When encoding a message deprecated fields are skipped. A notice will also be copied into the generated code.

Opcodes

Use [opcode(0x12345678)] before a record definition to associate an identifying "opcode" with it. You can also use a 4-byte ASCII string as an opcode: [opcode("Ping")].

Strictly speaking, Bebop is not opinionated about what you do with these opcodes. But you may find it useful to send this kind of thing over the wire:

12 34 56 78     03 00 00 00 18 00 ...
[4-byte opcode] [Bebop-encoded data]

And use the 4-byte opcode to decide which decoder/handler to dispatch the rest of the packet to. For more information see Mirrors.

All the compiler does is check that no opcode is used twice, and add something like class Foo { const int Opcode = 0x12345678; ... } in the generated code for you to use in your dispatching code.

Comments

As in many C-like languages, // starts a comment until the end of the line, whereas /* and */ delimit a block comment.

If a comment is placed directly before a field specification (/* like so */ int32 x;) or before a definition (/* like so */ struct S { ... }), that comment will be copied over as "documentation" to the corresponding bit of generated code.