Skip to content

Commit

Permalink
IPIP-412: Signaling Block Order in CARs on Gateways
Browse files Browse the repository at this point in the history
First draft based on various prior art and recent discussions cited in
the header front matter.
  • Loading branch information
lidel committed May 15, 2023
1 parent b07b1bc commit 63ea4ff
Show file tree
Hide file tree
Showing 2 changed files with 241 additions and 1 deletion.
2 changes: 1 addition & 1 deletion src/http-gateways/trustless-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ mode and `Accept` header is missing
Below response types MUST to be supported:

- [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw) – requests a single, verifiable raw block to be returned
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned
- [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned, implementations MAY support optional parameters (:cite[ipip-0412])
- [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) – requests a verifiable :cite[ipns-record] (multicodec `0x0300`).

# HTTP Response
Expand Down
240 changes: 240 additions & 0 deletions src/ipips/ipip-0412.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
---
title: "IPIP-0412: Signaling Block Order in CARs on HTTP Gateways"
date: 2023-05-15
ipip: proposal
editors:
- name: Marcin Rataj
github: lidel
url: https://lidel.org/
- name: Jorropo
github: Jorropo
relatedIssues:
- https://github.com/ipfs/specs/issues/348
- https://github.com/ipfs/specs/pull/330
- https://github.com/ipfs/specs/pull/402
- https://github.com/ipfs/specs/pull/412
order: 412
tags: ['ipips']
---

## Summary

Adds support for additional, optional content type options that allow the
client and server to signal or negotiate a specific block order in the returned
CAR.

## Motivation

We want to make it easier to build light-clients for IPFS. We want them to have
low memory footprints on arbitrary sized files. The main pain point preventing
this is the fact that CAR ordering isn't specified.

This require to keeping some kind of reference either on disk, or in memory to
previously seen blocks for two reasons.

1. Blocks can arrive out of order, meaning when a block is consumed (data is
red and returned to the consumer) and when it's received might not match.
1. Blocks can be reused multiple times, this is handy for cases when you plan
to cache on disk but not at all when you want to process a stream with use &
forget policy.

What we really want is for the gateway to help us a bit, and give us blocks in
a useful order.

The existing Trustless Gateway specification does not provide a mechanism for
negotiating the order of blocks in CAR responses.

This IPIP aims to improve the status quo.

## Detailed design

CAR content type
([`application/vnd.ipld.car`](https://www.iana.org/assignments/media-types/application/vnd.ipld.car))
already supports `version` parameter, which allows gateway to indicate which
CAR flavour is returned with the response.

The proposed solution introduces two new parameters for the content type headers
in HTTP requests and responses: `order` and `dups`.

The `order` parameter allows the client to indicate its preference for a
specific block order in the CAR response, and the `dups` parameter specifies
whether duplicate blocks are allowed in the response.

### Signaling in Request

Content type negotiation is based on section 12.5.1 of :cite[rfc9110].

Clients MAY indicate their preferred block order by sending an `Accept` header in
the HTTP request. The `Accept` header format is as follows:

```
Accept: application/vnd.ipld.car; version=1; order=dfs; dups=y
```

In the future, when more orders or parameters exist, clients will be able to
specify a list of preferences, for example:

```
Accept: application/vnd.ipld.car;order=foo, application/vnd.ipld.car;order=dfs;dups=y;q=0.5
```

The above example is a list of preferences, the client would really like to use
the hypothetical `order=foo` however if this isn't available it would accept
`order=dfs` with `dups=y` instead (lower priority indicated via `q` parameter,
as noted in :cite[rfc9110]).

#### `order` CAR content type parameter

The `order` parameter accepts the following values:

- `dfs`: [Depth-First Search](https://en.wikipedia.org/wiki/Depth-first_search)
order, allows for streaming responses with minimal memory usage
- `rnd`: Unknown (random) order, the implicit default when `order` parameter is missing.

#### `dups` CAR content type parameter

The `dups` parameter specifies whether duplicate blocks (the same block
occuring multiple times in the requested DAG) will be present in the CAR
response.

It accepts two values:
- `y`: duplicate blocks are allowed
- `n`: duplicates are not allowed

When allowed (`y`), light clients are able to discard blocks after
reading them, removing the need for caching in-memory or on-disk.

<!-- TODO: do we need a parameter for inclusion of identity CIDs?
It seems to be only relevant in Filecoin due to legacy hiccup:
https://github.com/ipfs/specs/pull/330#issuecomment-1274106892 -->

### Signaling in Response

The Trustless Gateway MUST always respond with a `Content-Type` header that includes
information about all supported/known parameters, even if the client did not
specify them in the request.

The `Content-Type` header format is as follows:

```
Content-Type: application/vnd.ipld.car;version=1;order=dfs;dups=y
```


Gateway implementations are free to decide on the implicit default ordering or
other parameters, and use it in responses when client did not explicitly
specify, or requested unsupported or unknown query parameter.

Implementations MAY choose to implement only some of the parameters.

## Design rationale

The proposed specification change aims to address the limitations of the
existing Trustless Gateway specification by introducing a mechanism for
negotiating the block order in CAR responses.

By allowing clients to indicate their preferred block order, Trustless Gateways
can cache CAR responses for popular content, resulting in improved performance
and reduced network load. Clients benefit from more efficient data handling by
deserializing blocks as they arrive,

We reuse exiting HTTP content type negotiation, and the CAR content type, which
already had the optional `version` parameter.

### User benefit

The proposed specification change brings several benefits to end users:

1. Improved Performance: Gateways can decide on their implicit default ordering
and cache CAR responses for popular content. In turn, clients can benefit
from strong `Etag` in ordered (deterministic) responses. This reduces the
response time for subsequent requests, resulting in faster content retrieval
for users.

2. Reduced Memory Usage: Clients no longer need to buffer the entire CAR
response in memory until the deserialization of the requested entity is
finished. With the ability to deserialize blocks as they arrive, users can
conserve memory resources, especially when dealing with large CAR responses.

3. Efficient Data Handling: By discarding blocks as soon as the CID is
validated and data is deserialized, clients can efficiently process the data
in real-time. This is particularly useful for light clients, IoT devices,
mobile web browsers, and other streaming applications where immediate access
to the data is required.

4. Customizable Ordering: Clients can indicate their preferred block order in the
`Accept` header, allowing them to prioritize specific ordering strategies that
align with their use cases. This flexibility enhances the user experience
and empowers users to optimize content retrieval according to their needs.

### Compatibility

The proposed specification change is backward compatible with existing client
and server implementations.

Trustless Gateways that do not support the negotiation of block order in CAR
responses will continue to function as before, providing their existing default
behavior, and the clients will be able to detect it by inspecting the
`Content-Type` header present in HTTP response.

Clients that do not send the `Accept` header or do not recognize the `order`
and `dups` parameters in the `Content-Type` header will receive and process CAR
responses as they did before: buffering/caching all blocks until done with the
final deserialization.

Existing implementations can choose to adopt the new specification and
implement support for the negotiation of block order incrementally. This allows
for a smooth transition and ensures compatibility with both new and old
clients.

### Security

The proposed specification change does not introduce any negative security
implications beyond those already present in the existing Trustless Gateway
specification. It focuses on enhancing performance and data handling without
affecting the underlying security model of IPFS.

Light clients with support for `order` and `dups` CAR content type parameters
will be able to detect malicious response faster, reducing risks of
memory-based DoS attacks from malicious gateways.

### Alternatives

Several alternative approaches were considered before arriving at the proposed solution:

1. Implicit Server-Side Configuration: Instead of negotiating the block order,
in the CAR response, the Trustless Gateway could have a server-side
configuration that specifies the default order. However, this approach would
limit the flexibility for clients, requiring them to have prior knowledge
about order supported by each gateway.

2. Fixed Block Order: Another option was to enforce a fixed block order in the
CAR responses. However, this approach would not cater to the varying needs
and preferences of different clients and use cases, and is not backward
compatible with the existing Trustless Gateways which return CAR responses
with Weak `Etag` and unspecified block order.

3. Separate `X-` HTTP Header: Introduction of a separate HTTP reader was
rejected because we try to use HTTP semantics where possible, and gateways
already use HTTP content type negotiation for CAR `version` and reusing it
saves a few bytes in each round-trip. Also, :cite[rfc6648] advises against
use of `X-` and similar constructs in new protocols.

The proposed solution of negotiating the block order through headers si
future-proof, allows for flexibility, interoperability, and customization while
maintaining compatibility with existing implementations.

## Test fixtures

Implementation compliance can be determined by testing the negotiation process
between clients and Trustless Gateways using various combinations of `order` and
`dups` parameters.

TODO:
1. a CAR with blocks for a small file in DFS order
2. a CAR with blocks for a small file with one block appearing twice


### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

0 comments on commit 63ea4ff

Please sign in to comment.