-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 382/ support X-Robots-Tag as a typed http header XRobotsTag #393
base: main
Are you sure you want to change the base?
Conversation
1a46b37
to
f3c6d97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad start. Do take your time to complete it and add sufficient start. But great start indeed. Added a couple of remarks to help you with some more guidance.
0c8c3b3
to
e66d95b
Compare
I've attempted to implement your suggestions, but I still have a couple of uncertainties:
Please note that this implementation is not yet complete, as the |
Questions like this are answered by thinking about its typical usage. Usually the parsing from the raw http header to the typed header will happen in the background as part of layer or web extractor or the like. So while you could try to return something like a custom error with exposure of the kind of error that happened I don't see it all that being useful here, even if you manually decode. Because even if you know what specific error happened, the result is either way the same, it's an invalid header so either you are okay with it or you are not. TLDR: opaque Error is fine here. If there is ever a strong reason to expose certain error variants, it can be requested by opening a new issue about it. But that's a discussion for then.
The answer is that you'll want to ensure you support this: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Robots-Tag#syntax ; So purely looking at whether it's a valid implementation you would do by checking against that reference syntax. After that the question will be whether it's an optimal enough implementation, but let's first focussing on correctly implementing this. |
You might want to read up a bit on how other folks have done it, even in other languages. E.g. look at https://github.com/nielsole/xrobotstag/blob/4cd7d8885a3e26fda9dd1a4075663cd3b80617f0/parser.go#L97. There's a lot to like about that implementation, but of course not to be copied as-is, given there is also plenty not to like about it. But what I personally take away from an implementation like this is:
CustomRule would be a private construct which can be as simple as
As such the only things we really need to expose here are the Not sure if you are a fan of the builder pattern but I like it a lot for cases like this, as for each rule you can expose this method to
The default one would be basically an empty tag, not rendered at all. Which is okay as the
This allows them to be constructed in any way. Feel free to define a private The custom rules can be constructed without exposing the I think this is pretty elegant and does everything you wanted to do without even having to expose our internal representation of it. Keeps the structure also a lot more flat without really losing much, given the TLDR: great job on your work so far. Seems you are getting pretty comfortable with the codebase and defintely know your way around rust. I hope my feedback and guidance above helps you out in finishing it. Feel free to push back on something you do not agree with or propose alternatives. Goal is mostly to keep this thing as simple and minimal as possible, while still allowing people to also do custom stuff, but without exposing too much of our internal API as to give the flexibility to change stuff internally without breaking stuff. Not that I mind making breaking changes where needed, but even better if you can upgrade without needing to break. |
I was wondering to maybe add some sort of
|
Out of scope for this PR and overkill for what you need here. A macro_rules has plenty to fill our needs here If one day you want to learn to write and read proc macros you might enjoy https://www.manning.com/books/write-powerful-rust-macros Could use some help in my venndb crate for a future release |
another thing, do you prefer a separate builder struct (example: |
The reason to introduce a separate builder struct, e.g. RobotsTag::builder().build() As at that point there is not yet anything set. In my proposal above I said as a shortcut you can just ignore empty robotstag and not write those. But if you do want to go this extra mile we can indeed prevent it at compile time, which is better so I would argue. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=82444bb713185c3f09ceeb917cc4fd66 here is an exmaple playground for you to show what i mean with that Reason why you would need a builder for that is because:
I'm totally okay with you building that out, good proposal of you, so why not. Btw note also in my example that I add both the "consume" variant ( in cases where you want to pass it immediately to a function or parent structure it is easier to just consume:
but in cases where you work with conditionals it's more conveniant to have the setter also available, e.g.:
Of course in the case of a bool this seems a bit silly, as you could just assign |
Hope you are doing @hafihaf123 , just checking in here with you, no pressure, take your time, there's no hurry. However in case you are stuck, have some questions or want to talk something through do let me know :) I'm here for you. But also fine if you are ok as-is and continue at your own pace on your own. All good, just want you to know this. |
Sorry for the delay, lately I didn't have much time to code. Anyways, I have completely reworked the API based on your suggestions. Please let me know where it needs some improvements and what should I focus on next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx starts to look close to being closed. Besides my remarks on the code, it seems we are still missing the actual implementation of https://docs.rs/rama/latest/rama/http/headers/trait.Header.html, right?
Hello, I would appreciate a review before proceeding with the implementation of the actual encoding and decoding functionalities. Additionally, I would welcome any guidance on the best approach to implement these. My current idea is to follow a structure similar to the Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate a review before proceeding with the implementation of the actual encoding and decoding functionalities.
Looks pretty good, a couple of small remarks but overall pretty good.
Additionally, I would welcome any guidance on the best approach to implement these.
My current idea is to follow a structure similar to the Via header, implementing the FromStr trait along with a custom CSV parsing function to handle bot names effectively. I would love to hear your thoughts on this approach.
Sounds good, my 2 cents if you wish is as follows:
- I think you can use the regular csv delim logic that's already there
- for each delimited value you could check if you could try to use https://doc.rust-lang.org/std/primitive.str.html#method.split_once to see whether or not a new bot name was defined, based on the key-value
:
separator
Mind though that you do want to add some sanity check there to ensure that you only allow starting a new RobotsTag if the previous tag had at least one value set (e.g. cheapest would be by keeping track of a counter or bool or w/e.
Other than that sounds like a sane idea.
Encoding is a lot easier so I guess you do understand that part.
rama-http/src/headers/x_robots_tag_components/max_image_preview_setting.rs
Outdated
Show resolved
Hide resolved
rama-http/src/headers/x_robots_tag_components/robots_tag_builder.rs
Outdated
Show resolved
Hide resolved
f10e6df
to
b887598
Compare
I have implemented the initial versions of the encode and decode functionalities. For parsing, I chose an iterator-based approach, where each iteration yields a RobotsTag representing a valid portion of the header for a single bot name. The iterator continues processing until the entire string has been parsed. The implementation is still somewhat rough and may benefit from refinements in structure, error handling, or efficiency. I would appreciate your feedback on areas where the approach could be improved. |
…ield to 'Vec<Rule>'
54c0675
to
83b118a
Compare
use headers::Error; | ||
use rama_core::error::OpaqueError; | ||
|
||
macro_rules! robots_tag_builder_field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering whether it would make more sense to add a custom rule for bool
fields, as it would make more sense to just be able to set them to true without having to specify (maybe then also add an toggle_off
function or something like that). Also, it would prevent building an invalid RobotsTag
by setting a field to false
(RobotsTag::builder().bot_name(None).no_snippet(false).build()
). Please let me know what you think about that.
P. S.
It would still be possible to do something like RobotsTag::builder().bot_name(None).no_follow().toggle_off_no_follow()
to achieve the same thing, but I feel like it is more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now, to keep it simple:
- ✅
no_follow()
instead ofno_follow(true)
- 🚫 let's not add the
toggle_off
versions
Also I would perhaps while you're at it not allow bot_name(None)
but only use it with a value if you want to set it with bot name. Such that the usage would be
RobotsTag::builder().bot_name(MyBot).no_follow()
if you want to specify it for a bot or
RobotsTag::builder().no_follow()
to specify without a bot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the setters on a constructed RobotsTag
? Should they also be changed for bool
values or kept as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the setters should be gone from RobotsTag, gonna be easier if that's just read-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builder was relying on the setters to modify the RobotsTag
structure inside it, how do I change that? Do I keep the setters but restrict their visibility or do I modify the visibility of the fields or how else could it be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pub(super) the properties of RobotsTag
Initial Implementation of
X-Robots-Tag
HeaderThis pull request introduces an initial implementation of the
X-Robots-Tag
header for therama-http
crate and closes #382Summary of Changes:
Header Registration:
x-robots-tag
header to the static headers list inrama-http-types/src/lib.rs
.Implementation of the Header:
XRobotsTag
struct that represents the header and implements therama_http::headers::Header
trait.Rule
enum to represent indexing rules such asnoindex
,nofollow
, andmax-snippet
. Complex rules likemax-image-preview
andunavailable_after
are also included.Element
struct to represent a rule optionally associated with a bot name.XRobotsTag
to iterate over its elements.File Structure:
rama-http/src/headers/x_robots_tag/
, which includes the following files:rule.rs
: Defines theRule
enum and parsing logic.element.rs
: Implements theElement
struct and its parsing/formatting logic.iterator.rs
: Provides an iterator forXRobotsTag
.mod.rs
: Combines and exposes the module’s functionality.Encoding and Decoding:
Header
trait, supporting CSV-style comma-separated values.Questions and Feedback Requested:
Code Structure:
x_robots_tag
) appropriate?Implementation Design:
Rule
andElement
structs?Vec<Element>
forXRobotsTag
suitable, or are there optimizations to consider?Edge Cases and Standards:
Testing:
I look forward to your feedback and suggestions for improvement. Thank you!