Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add metadata to the manifest #960

Merged
merged 16 commits into from
Oct 25, 2024

Conversation

2-towns
Copy link
Contributor

@2-towns 2-towns commented Oct 18, 2024

Closes #865

This PR adds the mimetype, filename, and createdAt timestamp to the manifest.

As discussed in the Codex client meeting, the mimetype is extracted from the content-type header of the request, and the filename is extracted from the content-disposition header.

I updated the test 'node shows used and available space' test because the space.quotaUsedBytes is now 65602 bytes, i.e., 10 bytes more due to the added metadata (I guess).

@2-towns 2-towns requested a review from benbierens October 18, 2024 17:28
@2-towns 2-towns linked an issue Oct 18, 2024 that may be closed by this pull request
@2-towns 2-towns requested a review from gmega October 18, 2024 17:28
Copy link
Member

@gmega gmega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to approve cause I think we should think a bit more about the optional fields, let's see if there's a more elegant way to do this than setting defaults. It also seems to be failing an integration test so we need to look into it as well.

codex/manifest/manifest.nim Outdated Show resolved Hide resolved
codex/rest/api.nim Outdated Show resolved Hide resolved
Copy link
Member

@AuHau AuHau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! Thanks 🙏 I was myself wanting this addition for some time 😅

I have few comments. Also, I would suggest adding serving these metadata (mainly content type and filename) for the data download using the already used Content-Disposition header, unless I missed it somewhere already?

build.nims Outdated
@@ -42,6 +42,10 @@ task testIntegration, "Run integration tests":
buildBinary "codex", params = "-d:chronicles_runtime_filtering -d:chronicles_log_level=TRACE -d:codex_enable_proof_failures=true"
test "testIntegration"

task testRestApi, "Run rest api tests":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not create another "test running target". These tests are part of Integration tests. I understand that if you are debugging something test you want to run just the failing tests and I guess that is why you introduce these?

My receipt on that is either comment out the tests from for example testIntegration.nim file. Or you can simply compile only the specific test file and run that like:

$ nim c -r --out:build/<<whatEver>> --verbosity:0 --hints:off -d:release -d:chronicles_log_level=TRACE -d:chronicles_sinks="textlines[stdout]"  tests/<<whatEverTestFile>> && ./build/<<whatEver>>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will remove it.

Yes, I think commenting out the tests is the best option for me.

Comment on lines 165 to 175
if pbHeader.getField(8, filename).isErr:
# The filename field is an optional data so we can ignore the error the decoding fails
discard

if pbHeader.getField(9, mimetype).isErr:
# The mimetype field is an optional data so we can ignore the error the decoding fails
discard

if pbHeader.getField(10, uploadedAt).isErr:
# The uploadedAt field is an optional data so we can ignore the error the decoding fails
discard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soooo generally, ignoring all errors is a bad idea. Can you assert that the error was returned because of missing field? Are there other possible errors that could be returned here? (I don't know the API here so I don't really know).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soooo generally, ignoring all errors is a bad idea.

Yes, you're right. The reason I ignored this error is that these fields are optional, and I didn't want to break anything if the decoding process fails because those fields are missing.

Can you assert that the error was returned because of missing field?

I really don't know I need to investigate this. It's coming from libp2p.

@2-towns
Copy link
Contributor Author

2-towns commented Oct 22, 2024

Also, I would suggest adding serving these metadata (mainly content type and filename) for the data download using the already used Content-Disposition header, unless I missed it somewhere already?

Yes I thought about it this morning, will try to add it this afternoon.

@2-towns
Copy link
Contributor Author

2-towns commented Oct 24, 2024

Okay so @benbierens helped to figure out what was the problem.

The test file concerned but this issue was testecbug.nim.

The test was hanging because the store stream was expecting more data than was available, resulting in the error: 'stream boundary not yet reached!'

The issue arose because I added the Content-Length header to the download endpoint. By removing the header, the error is resolved.

@gmega I think this PR is ready to be merged.

Copy link
Member

@gmega gmega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some debug echo statements that shouldn't be there.

codex/rest/api.nim Outdated Show resolved Hide resolved
codex/manifest/coders.nim Show resolved Hide resolved
Copy link
Member

@gmega gmega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmega gmega added this pull request to the merge queue Oct 25, 2024
Merged via the queue into master with commit 2fb7031 Oct 25, 2024
13 checks passed
@gmega gmega deleted the feat/manifest/add-metadata-to-manifest branch October 25, 2024 15:04
veaceslavdoina pushed a commit that referenced this pull request Nov 4, 2024
* Add metadata to the manifest

* Remove useless import

* Fix the openapi documentation

* Use optional fields instead of default values

* Remove testRestApi target

* Return failure when the protobuf cannot get the field

* Set download headers and fix cors headers when an error is returned

* Add tests to verify the download headers

* Try to adjust the content length header

* Fix convertion to string

* Remove the content length header

* Remove testRestApi target

* Removing debug messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Richer dataset manifests
3 participants