uip | title | description | author | status | type | category | created |
---|---|---|---|---|---|---|---|
75 |
%ames: Directed Messaging |
a request/response discipline for the network. |
~master-morzod |
Draft |
Standards Track |
Kernel |
~2023.8.9 |
Imposing a request/response discipline on all %ames messages and packets provides legibility at every layer, making the entire network easier to reason about, extending the urbit network to new use-cases and enabling a system that is more stable, reliable, and scalable.
A new message layer is introduced, unifying %ames and fine. %ames' existing flow semantics will be rebased as a new layer on top of these messages.
- %peek: a stateless, referentially-transparent read
- %poke: a stateful, acknowledged, referentially-transparent write
- %page: marked data, bound via authentication to the namespace
(names TBD)
A %peek message is a request for a %page at a path in the global namespace. It is unauthenticated and anonymous. In the future, request authentication could be used to gate access to computational resources and enable QoS, but never for access control to data itself.
%peek can be injected as an event, but must not change formal state. It should be handled by dereferencing the request path via arvo's +peek
arm on the publishing ship. A %peek message produces at-most-one %page response -- blocking/crashing requests are dropped.
%peek is equivalent to fine's %keen.
A %page message is a public, authenticated binding of marked data in the urbit namespace -- in practical terms, a response to a %peek or %poke request. The binding between path and data must never change, violation of this principle should be shared widely and have severe reputational consequences for the offending party.
A %page message is injected as an event. If it correlates to an outstanding request -- via an exact match of its path -- it is processed with arbitrary stateful semantics. If it does not correlate, it is silently dropped.
%page is equivalent to fine's %tune.
A %poke messages pushes a %page message from one requester to requestee, specifying the path by which the requestee will acknowledge the %poke. It must be straightforward to unambiguously correlate the payload path to the prefix of the request path, such that the authentication of the payload is sufficient to authenticate the request (as is trivially the case for %ames' flows).
A %poke message can be resolved by dereferencing the request path via arvo's +peek
arm on the publish ship. If the path can be resolved, the message has been processed and the result is the response %page. If not, the %poke must be injected as an event. A valid %poke message produces exactly one %page response -- crashing requests are converted to negative acknowledgments.
%poke generalizes %ames' %plea and %boon messages, simultaneously rendering "nacksplanations" superfluous.
Packets recapitulate the structure of messages exactly, specifying their precise serialization and fragmentation, with sufficient metadata (and sized for) straightforward, maximal deliverabity over current networks.
Every packet has a maximum size of 1.472 bytes, and begins with a 4 byte header. The header must encode (in no particular order):
- the protocol version (3 bits)
- the packet type (2 bits)
- the publisher rank (2 bits, only for requests)
- a truncated mug checksum (20 bits)
- hopcount (5-7 bits available)
The remaining 1.468 bytes are allocated as follows:
- %peek: { publisher[<=16], tag-byte[1], path[<=328] }
- %poke: { publisher[<=16], path[<=328], total[4], authenticator[96], fragment[<=1024] }
- %page: { next-hop[6], path[<=328], total[4], authenticator[96], fragment[<=1024] }
These values are as follows:
- publisher: ship from the root of the request type, variable length, based on rank in header
- %peek tag-byte: future-proofing for signed requests, possibly hash-based exclusions
- path: length-prefixed (2 bytes) request path, with 4-byte fragment number
- in the case of %poke: the request and payload paths are concatenated
- total: number of fragments in %page response (max size: 4TiB)
- authenticator: see discussion below
- fragment: bloq 13 slice of the serialized message
outstanding questions:
- precise order and interpretation of header bits
- hopcount semantics (actual + max; saturating counter?)
- authenticator details
- fixed length limits
- %page
- max path length could increase by ten
- or next-hop could increase by 12 bytes and prepare for ipv6 (offsetting path)
- %poke structure
- currently, request and payload paths are concatenated to simplify max-length calculation
- if specified as variable length
- could be two separate paths
- or simply two concatenated packets (%peek and %page)
- %page
While normally outside the scope of these proposals, the routing and forwarding mechanisms anticipated for these new packet types fundamentally motivates their design. We will switch from stateless to stateful forwarding, tracking requests and correlating responses to them. Continuing our tradition of a single implementation for consumers, subscribers, and forwarding nodes, the ames i/o driver in vere will track in ephemeral state:
- our active requests
- "pending interest" table (possibly unified with our requests)
(map request (pair @da (set lane)))
- cache:
(map request response)
, possibly include time or other metadata to prioritize eviction - transitive sponsorship predicate
- routing table for transitive sponsees
The transitions on the lifecycle of a packet are as follows:
packet lifecyle:
-
send a request packet
- request routing is exactly the same as current ames
- if we don't have a direct route to "her", send through "her" galaxy
- sending a packet on multiple lanes should be a single effect
- the ames driver will additional track the set of active requests in ephemeral state
- request routing is exactly the same as current ames
-
receive request packet
- if the response is in cache, send it
- if the same request is already pending
- add the source lane to the pending set
- update liveness timestamp
- XX when should relays re-send, if ever?
- add to pending table, advance the request
- if it's not for us
- if we're a galaxy, try to forward
- otherwise, drop the packet
- otherwise, handle ourselves
- if it's not for us
-
forward request packet to "her" (as galaxy)
- if we transitively sponsor/parent "her"
- forward if we have a route for "her"
- log if not
- otherwise, drop the packet
- if we transitively sponsor/parent "her"
-
handle request packet
- if the request is stateful: enqueue event
- if the request is stateless
- +peek on request path: success
- send to all lanes in pending set
- delete pending entry
- put response in cache (XX policy)
- failure: drop the packet
- +peek on request path: success
-
receive response packet
- if a request is pending
- send on all associated lanes
- put in cache
- if we requested it
- XX determined via "pending" table or separate state?
- inject response packet as event
- otherwise, drop it
- if a request is pending
NB: these namespace examples are provisional.
~zod sends a gall poke to ~nec, creating a new flow:
- ~zod places an ames %plea (3 fragments) in the namespace
/~zod//~nec/flow/bone=0/message=0
- ~zod issues a %poke packet for
/~nec//~zod/ack/bone=0/message=0/fragment=0
- with the %page at
/~zod//~nec/flow/bone=0/message=0/fragment=0
as the payload
- with the %page at
- ~nec process the request as an event
- ~nec issues a stateless request for
/~zod//~nec/flow/bone=0/message=0/fragment=1
- ~zod responds statelessly, ~nec receives
- ~nec issues a stateless request for
/~zod//~nec/flow/bone=0/message=0/fragment=2
- ~zod responds statelessly, ~nec receives
- ~nec issues the response n/ack for the initial request at
/~nec//~zod/ack/bone=0/message=0/fragment=0
~zod subscribes to to ~nec, on the same flow
- ~zod places the %plea for a gall %watch to ~nec in the namespace
/~zod//~nec/flow/bone=0/message=1
- ~zod issues a %poke packet for
/~nec//~zod/ack/bone=0/message=1/fragment=0
- with the %page at
/~zod//~nec/flow/bone=0/message=1/fragment=0
as the payload
- with the %page at
- ~nec responds with an ack at
/~nec//~zod/ack/bone=0/message=1/fragment=0
- ~nec issues a subscription update
- placing the %boon in the namespace at
/~nec//~zod/flow/bone=1/message=0
- placing the %boon in the namespace at
- ~nec issues a %poke packet for
/~zod//~nec/ack/bone=1/message=0/fragment=0
- with the %page at
/~nec//~zod/flow/bone=1/message=0/fragment=0
as the payload
- with the %page at
- ~zod issues the response n/ack for the %boon at
/~zod//~nec/ack/bone=1/message=0/fragment=0
Note that the bones used here follow %ames' current scheme (orphaning naxplanations, which can instead be retrieved via %peek). This is not strictly necessary, the namespace structure encodes flow direction itself, but it's likely the simplest option for migrating from the current system.
Our current namespace-data response packets (%tune) are signed twice: first at the message level, and again at the packet level. Even with our use of ed25519, this is very expensive. Performance can be increased by batching both signature generation and verification; this should eventually be done, but it will complicate our interfaces and implementation.
Signing individual packets is strictly overkill -- the crucial attestation is the message-level binding of path and data. But we still need a practical mechanism for authenticating individual packets within a message; if verification fails, which packets need be re-requested? Ideally, we can extend the message level authentication to an individual packet, without separately authenticating each.
Merkle trees provide a way to accomplish this: authenticate the message by signing the merkle root of all it's fragments, then validate any fragment with a "merkle audit path": all the branches not taken on the path from the root to the fragment.
Even this is may be overkill: we don't need to (easily) authenticate each fragment in isolation, but only in the context of all preceding fragments. We'll usually receive them in order, and we can certainly process them in order.
options:
- sign every packet as we do now
- sign a tree root, attach the entire tree in the message
- sign a tree root, attach intermediate nodes to fragments in the order we will encounter them
This last option is the design of the dao verified streaming format, which was the genesis of Blake3, a fast cryptographic hash with an internal tree structure.
We could simply use the dao encoding, but we would lose the nature of fragments as static offsets into the serialized message. And our limited packet size (and effectively unbounded message size) means that we don't always have space to attach all the intermediate node hashes we'll need.
We may be able to defer fragment authentication to some interval for large messages, but in the case of verification failure, we'll still need to be able to isolate the offending packets. So we'll need a way to retrieve intermediate ashes explicitly, as "merkle audit paths" or in some other order. Depending on the details, this could be a good use-case for unauthenticated (ie, trivially self-authenticating) response packets, as it would be unfortunate to have to sign some part of a merkle tree when we've already signed the root.
The 96 bytes alloted above is intended to store
- a signature only in the case of a single-fragment message
- a signature and a root hash on the first fragment of a message
- up to 3 intermediate node hashes on subsequent fragments
detailed questions:
- can we adapt ed25519 from sha-512 to blake3
- either way, will we needed extended output from blake3 (default is 32)
- can we straightforwardly compute the "best" intermediate node hashes to attach to a given fragment
XX note ames message-level vs packet-level encryption
Until now, %ames has used "criss-cross" routing: two peers communicating without direct routes will each send their messages through the others sponsoring galaxy. This topology is right and just: a relay must both be generally available and have a route to the destination ship. Establishing and maintaining a route from a relay to a destination ship must be the responsiblity of that ship, as the relay can only learn a ships location by hearing from it. Galaxies are axiomatically available (DNS bindings, well-known ports), and each ship regularly pings its sponsoring galaxy. So selection of a relay for a given ship is a coordination problem, and any other forwarding topology would just bring additional route-maintenance responsibilities with it.
This model imposes mutual routing responsibility on all communicating peers -- in asymmetric publisher/subscriber scenarios, the publisher must explicitly route to each subscriber, imposing linear scaling costs. However, we can take advantage of this asymmetry: by explicitly modeling packets as requests or responses and tracking requests, we can simply "return" responses back along the route where we heard the request.
Network messages are already expressed in request/response terms: a %keen requests a %tune, while both %plea and %boon can be considered requests for a/n n/ack. So long as these semantics are legible in the packet format, requests can be forwarded statefuly (leaving breadcrumbs along the forwarding path) and responses can simply be "returned" to the requester (consuming the breadcrumbs). This "symmetric" approach to routing, from Van Jacobson's NDN and related projects, has several advantages:
- it plays well with stateful firewalls (responses return on the socket from which the request was sent)
- it limits much of the routing state to a table of short-lived requests ("pending interests")
- it enables anonymous network clients (as responses can be delivered without the need to explicitly address the requester)
- it presents opportunities for transparent transport agnosticism (via relays)
- it places in-network caches ideally (in the case of recursive sponsor routing)
Currently, %ames routes are discovered in three ways: axioms, metadata on forwarded packets, and metadata about received packets. Galaxies are always available (discovered through urbit.org DNS bindings) and forward packets, and those packets are decorated with a first-hop "origin" lane. In this way, ships can attempt to "tighten" routing, sending directly to the "origin" lane and additionally through the galaxy. The source lane is injected into arvo along with every packet; when packets are received without an origin, the source lane is considered a direct route to that pier.
This current model works reasonably well with only one level of forwarding, but has never worked across multiple relays. The new request/response routing discipline is an opportunity to make route discovery more rigorous, and generalize it across forwarding models.
In short:
- galaxies are still axiomatically available, and now forward only for their recursive sponsees
- instead of an "origin" lane in packet metadata, which should have a "next-hop" lane
- ie, set by every relay, not just the first
- the "next-hop" lane on a response packet is a candidate for incrementally tightening the route for a future request to the same ship
- direct routes can be learned from the source lane of a request or response packet with no "next-hop" lane and/or a hopcount of zero
XX need effects from ames to vere to track routes
Fine packets don't strictly need to change to be routed in this way; the binding between %keen requests and %tune responses is already legible. But they contain more metadata then is strictly necessary: both sender (requester) and receiver (responder) addresses are formally unnecessary, although the receiver is encoded more accessibly as metadata then in the path.
Ames packets must change so that requests (%plea, %boon, and nacksplanation) can be individually correlated to response (n/acks). The minimal change would simply add a tuple of flow, message, and fragment numbers to the existing packet metadata, as these together form a unique identifier for the payload on the sender side, and the n/ack on the receiver side.
It would be possible to deliver %ames messages through %keen requests with one extra feature: the namespace read (+peek) would need to be upgradeable to an event in certain circumstances (probably only signed requests). In the case of ~zod poking ~nec, it would look like:
- ~zod sends request packet for /~nec/ack/~zod/flow/$FLOW/$MESSAGE/$FRAGMENT, no request body
- ~nec driver recognizes and injects packet as event
- ~nec sends request packet for /~zod/poke/~nec/flow/$FLOW/$MESSAGE/$FRAGMENT, no request body
- ~nec receives response, "returns" n/ack response
This is elegant from a certain perspective -- such a pattern would let us build all network communication out of a single primitive: pulling a datum at a time. The downside is that we would double the number of packets and roundtrips.
However, when we admit a stateful request packet type, we can still achieve pull-based data flow for large messages:
- ~zod sends a stateful request for /~nec/ack/~zod/flow/$FLOW/$MESSAGE
- request body includes the response for /~zod/poke/~nec/flow/$FLOW/$MESSAGE/1
- ~nec requests additional fragments as needed
- ~nec "returns" the n/ack response for ~zod's original request
This places all %ames messages clearly in the namespace, legibly binding requests to responses, and putting data receivers (mostly) in charge of congestion control, without any additional roundtrips or packets. What's more, 2 * num-fragments
writes (to the event log) per message in the existing model turn into 1 + num-fragments
writes -- a boon to %gall publishers everywhere.
These features and changes may well be developed and released in the opposite order that they are introduced here. Fine can already be routed like this, and a new format for ames' existing messages can likely be added to ames as is. A new layering, with a higher-level message interface and flows reimplemented on top will should significantly simplify ames, but it's a non-trivial change. The external message-level interface is not needed until practical transport agnosticism is being directly pursued (about which more soon).
Existing %fine packets can be routed statefully in this model, but can only be multicast with each other (unless relays can convert bedirectionally between old and new structures). Existing %ames packets will continue be routed stateless, as (mutual) requests.
This style of request/response packet routing introduces two forms of denial of service: request flooding and cache poisoning. The former is straightforwardly mitigated by route-based rate limiting. The latter attack exists due to the performance overhead of data authentication; regardless of the model, it's likely impractical for relays to authenticate all packets. Cache poisoning can be mitigated with strict cache policies (only cache the subset of packets that we do authenticate), heuristic content authentication, hash-based exclusion on request packets, or a gossip mechanism to remove invalid content from caches. The simplest approach will be to start with a strict policy.
Copyright and related rights waived via CC0.