-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
routing api: clarify string in the protocol field #403
Conversation
This clarifies what values can be put in the field to improve robustness, and that protocols without multicodec registered should ensure global uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @masih - I think this is a reasonable compromise and language that IPNI software is happy to support.
@@ -54,10 +54,13 @@ Both read and write provider records have a minimal required schema as follows: | |||
|
|||
Where: | |||
|
|||
- `Protocol` is the multicodec name of the transfer protocol or an opaque string (for experimenting with novel protocols without a multicodec) | |||
- `Protocol` is a globlly unique opaque string that represents the transfer protocol used by a content provider. | |||
- If a multicodec has been registered for the protocol, it can be referred to by either its `name` or `code` in `0xHEX` notation. Using the `code` is preferred since it is what is sent on the wire and won't break if the plain text `name` label is ever changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly out of energy to care about what we name things, however a couple things I'd like to call out in relation to the fact that this discussion has been had multiple times in the last year alone and it's not obvious to me what's changed/changing.
- The protocol in Routing V1 IIRC was initially using the code in the table, but that was scrapped because people wanted prettier names
- When the protocol in Routing V1 was based on the code it was using decimal rather than hex since that's more native to JSON and saves on bytes
- At the time that it was decimal it was also brought up that the same argument could be made about multiaddrs which are represented in Routing V1 as text rather than as representations of the binary encoding
So high level questions I'd like to see answered for when someone proposes RoutingV2 which ends up revisiting this again:
- Why are codes preferred over text?
- Why a string hex representation rather than something else like decimal?
- Why 0x rather than multibase and f?
- Why use text for multiaddrs but not data transfer?
To be clear, I don't feel strongly about the answers to any of these and I think there are viable answers for all of them. However, for when this inevitably ends up revisited in the future (we've done it at least twice in the last year or so) I'd like to have a comment I can point at where someone can respond with "we are doing it differently because in retrospect we disagree with the logic around decision X".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are codes preferred over text?
I think text names have been changed at least one time before (I think for json and dag-json, or something like that) and it caused a painful migration. Names can change, while the codes are static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names can change, while the codes are static.
I mentioned this above already, but if the idea is that names can change why are we using text for multiaddrs? I'm not insisting we have to be consistent here and use text for both or codes for both protocol identifiers and multiaddrs, but this will come up again and for those in support of the change I'd like the reasoning documented which it currently is not.
Co-authored-by: Alex Potsides <[email protected]>
We use codes here as they are more compact in storage. Because the wire values are signed, use of codes leads to efficiency in stored and transferred records.
Since codes are an unbounded numeric space, it is conceivable members may experiment using codes that are outside of the standard decimal representation of javascript. As such, a string representation is stable for transfer across representations.
I would be okay using multibase + f, although I don't know which library I would actually reach for to do the packing/unpacking. In practice, i expect I would just do exact string matching so it wouldn't matter. I think My main argument here is that the "0x" format is what is present in e.g. the multicodec table today, so it's easy to correlate against.
We do have codes for the compact representation of multiaddrs, so I think this differentiation is more about the encoding for use on the wire here. I see two differences:
|
@@ -54,10 +54,13 @@ Both read and write provider records have a minimal required schema as follows: | |||
|
|||
Where: | |||
|
|||
- `Protocol` is the multicodec name of the transfer protocol or an opaque string (for experimenting with novel protocols without a multicodec) | |||
- `Protocol` is a globally unique opaque string that represents the transfer protocol used by a content provider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globally unique
and opaque
are somewhat conflicting, aren't they? If it's opaque, how do we guarantee that it's globally unique? maybe i'm misunderstanding how opaque applies here, but maybe opaque is a "golang-esque" terminology that shouldn't be in the spec?
side note: I think we over(mis)use the term "opaque"
Closing, as I believe this is obsoleted by content changes from #417, which moves away from trying to follow names from multiformats/multicodec/table.csv, and specifies nothing more than "protocol" being a string ecosystem implementations agree upon. My expectation is that the ecosystem will want to add a list of "well known" ones to the spec (even if it is in "notes for implementers" Appendix), but that will happen once we have trustless gateway protocol used in addition to bitswap for a while (Kubo 0.23 will ship experiment, but it will be a few releases before we have client for this and are comfortable enabling it by default). |
This PR is based on discussions from last week and this notion.
Aims to clarify
Protocol
andSchema
fieldsand close #377