Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 382/ support X-Robots-Tag as a typed http header XRobotsTag #393

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
bc86ca7
add XRobotsTag, initial implementation
hafihaf123 Jan 12, 2025
4888e89
add value_string.rs
hafihaf123 Jan 15, 2025
2dc2e93
add more context with comments
hafihaf123 Jan 15, 2025
e6b7b53
add ValidDate, custom rules
hafihaf123 Jan 15, 2025
fe4394c
fix value_string.rs visibility issues
hafihaf123 Jan 15, 2025
09775f7
rename Iterator to ElementIter
hafihaf123 Jan 17, 2025
51b3171
fix visibility issues
hafihaf123 Jan 17, 2025
62bd0d6
change trait TryFrom<&[&str]> to private function from_iter
hafihaf123 Jan 17, 2025
f17550c
separate 'split_csv_str' function from 'from_comma_delimited'
hafihaf123 Jan 17, 2025
dcf3586
change bot_name field type to 'HeaderValueString' and indexing_rule f…
hafihaf123 Jan 17, 2025
879394d
implement FromStr for Element
hafihaf123 Jan 17, 2025
33b63f3
reformat with rustfmt
hafihaf123 Jan 17, 2025
9e0b8ae
todo/ fix XRobotsTag::decode()
hafihaf123 Jan 17, 2025
bae6cad
add chrono crate to dependencies
hafihaf123 Jan 27, 2025
f8d78ac
rework API
hafihaf123 Jan 27, 2025
6a52814
fix chrono dependency placement
hafihaf123 Jan 27, 2025
749c086
enhance code, add valid_date.rs
hafihaf123 Jan 27, 2025
3aa02ec
add x_robots_tag.rs
hafihaf123 Jan 29, 2025
b4a1fe1
implement FromStr for ValidDate
hafihaf123 Jan 29, 2025
a480e13
enhance code
hafihaf123 Jan 31, 2025
1f32008
implement display for ValidDate
hafihaf123 Feb 1, 2025
7f62911
improve error handling for HeaderValueString
hafihaf123 Feb 1, 2025
26ef8c6
initial implementation of encode, decode
hafihaf123 Feb 1, 2025
29bc0f9
enhance decode/parse functionalities
hafihaf123 Feb 2, 2025
d731bcf
remove magic strings in max_image_preview_setting.rs
hafihaf123 Feb 2, 2025
0d055a8
remove from_str functionality (moved to Parser::next())
hafihaf123 Feb 2, 2025
266552d
fix error checking
hafihaf123 Feb 2, 2025
64fb073
builder checks validity at compile-time
hafihaf123 Feb 2, 2025
c59690a
fix RobotsTag::is_valid_field_name
hafihaf123 Feb 2, 2025
6c1944b
add tests for valid_date.rs
hafihaf123 Feb 2, 2025
83b118a
change visibility
hafihaf123 Feb 5, 2025
8344c93
add parsing from rfc 850
hafihaf123 Feb 8, 2025
36c031c
fix timezone lookup logic
hafihaf123 Feb 8, 2025
8b6fc68
add docs
hafihaf123 Feb 9, 2025
de26de8
make RobotsTag read-only, enhance Builder API
hafihaf123 Feb 10, 2025
51afbf8
remove unused functions
hafihaf123 Feb 10, 2025
00468dc
add docs
hafihaf123 Feb 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 73 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ bitflags = "2.4"
md5 = "0.7.0"
brotli = "7"
bytes = "1"
chrono = "0.4.39"
clap = { version = "4.5.15", features = ["derive"] }
crossterm = "0.27"
csv = "1.3.1"
Expand Down
6 changes: 6 additions & 0 deletions rama-http-types/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,12 @@ pub mod header {
"x-real-ip",
];

// non-std web-crawler info headers
//
// More information at
// <https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Robots-Tag>.
static_header!["x-robots-tag"];

/// Static Header Value that is can be used as `User-Agent` or `Server` header.
pub static RAMA_ID_HEADER_VALUE: HeaderValue = HeaderValue::from_static(
const_format::formatcp!("{}/{}", rama_utils::info::NAME, rama_utils::info::VERSION),
Expand Down
1 change: 1 addition & 0 deletions rama-http/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ async-compression = { workspace = true, features = [
base64 = { workspace = true }
bitflags = { workspace = true }
bytes = { workspace = true }
chrono = { workspace = true }
const_format = { workspace = true }
csv = { workspace = true }
futures-lite = { workspace = true }
Expand Down
6 changes: 6 additions & 0 deletions rama-http/src/headers/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,10 @@ pub mod authorization {
pub use ::rama_http_types::headers::HeaderExt;

pub(crate) mod util;

pub mod x_robots_tag_components;

mod x_robots_tag;
pub use x_robots_tag::XRobotsTag;

pub use util::quality_value::{Quality, QualityValue};
26 changes: 15 additions & 11 deletions rama-http/src/headers/util/csv.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,28 @@ use crate::HeaderValue;
pub(crate) fn from_comma_delimited<'i, I, T, E>(values: &mut I) -> Result<E, Error>
where
I: Iterator<Item = &'i HeaderValue>,
T: ::std::str::FromStr,
E: ::std::iter::FromIterator<T>,
T: std::str::FromStr,
E: FromIterator<T>,
{
values
.flat_map(|value| {
value.to_str().into_iter().flat_map(|string| {
string
.split(',')
.filter_map(|x| match x.trim() {
"" => None,
y => Some(y),
})
.map(|x| x.parse().map_err(|_| Error::invalid()))
})
value
.to_str()
.into_iter()
.flat_map(|string| split_csv_str(string))
})
.collect()
}

pub(crate) fn split_csv_str<T: std::str::FromStr>(
string: &str,
) -> impl Iterator<Item = Result<T, Error>> + use<'_, T> {
string.split(',').filter_map(|x| match x.trim() {
"" => None,
y => Some(y.parse().map_err(|_| Error::invalid())),
})
}

/// Format an array into a comma-delimited string.
pub(crate) fn fmt_comma_delimited<T: fmt::Display>(
f: &mut fmt::Formatter,
Expand Down
2 changes: 2 additions & 0 deletions rama-http/src/headers/util/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
pub(crate) mod csv;
/// Internal utility functions for headers.
pub(crate) mod quality_value;

pub(crate) mod value_string;
63 changes: 63 additions & 0 deletions rama-http/src/headers/util/value_string.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
use http::header::HeaderValue;
use std::fmt::{Display, Formatter};
use std::{
fmt,
str::{self, FromStr},
};

/// A value that is both a valid `HeaderValue` and `String`.
#[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct HeaderValueString {
/// Care must be taken to only set this value when it is also
/// a valid `String`, since `as_str` will convert to a `&str`
/// in an unchecked manner.
value: HeaderValue,
}

impl HeaderValueString {
pub(crate) fn as_str(&self) -> &str {
// HeaderValueString is only created from HeaderValues
// that have validated they are also UTF-8 strings.
unsafe { str::from_utf8_unchecked(self.value.as_bytes()) }
}
}

impl fmt::Debug for HeaderValueString {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
fmt::Debug::fmt(self.as_str(), f)
}
}

impl Display for HeaderValueString {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
fmt::Display::fmt(self.as_str(), f)
}
}

impl<'a> From<&'a HeaderValueString> for HeaderValue {
fn from(src: &'a HeaderValueString) -> HeaderValue {
src.value.clone()
}
}

#[derive(Debug)]
pub struct FromStrError(&'static str);

impl FromStr for HeaderValueString {
type Err = FromStrError;

fn from_str(src: &str) -> Result<Self, Self::Err> {
// A valid `str` (the argument)...
src.parse()
.map(|value| HeaderValueString { value })
.map_err(|_| FromStrError("failed to parse header value from string"))
}
}

impl Display for FromStrError {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
writeln!(f, "{}", self.0)
}
}

impl std::error::Error for FromStrError {}
57 changes: 57 additions & 0 deletions rama-http/src/headers/x_robots_tag.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
use crate::headers::x_robots_tag_components::robots_tag_components::Parser;
use crate::headers::x_robots_tag_components::RobotsTag;
use crate::headers::Error;
use headers::Header;
use http::{HeaderName, HeaderValue};

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct XRobotsTag(Vec<RobotsTag>);

impl Header for XRobotsTag {
fn name() -> &'static HeaderName {
&crate::header::X_ROBOTS_TAG
}

fn decode<'i, I>(values: &mut I) -> Result<Self, Error>
where
Self: Sized,
I: Iterator<Item = &'i HeaderValue>,
{
let elements = values.try_fold(Vec::new(), |mut acc, value| {
acc.extend(Parser::parse_value(value).map_err(|err| {
tracing::debug!(?err, "x-robots-tag header element decoding failure");
Error::invalid()
})?);

Ok(acc)
})?;

Ok(XRobotsTag(elements))
}

fn encode<E: Extend<HeaderValue>>(&self, values: &mut E) {
use std::fmt;
struct Format<F>(F);
impl<F> fmt::Display for Format<F>
where
F: Fn(&mut fmt::Formatter<'_>) -> fmt::Result,
{
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
self.0(f)
}
}
let s = format!(
"{}",
Format(|f: &mut fmt::Formatter<'_>| {
crate::headers::util::csv::fmt_comma_delimited(&mut *f, self.0.iter())
})
);
values.extend(Some(HeaderValue::from_str(&s).unwrap()))
}
}

impl FromIterator<RobotsTag> for XRobotsTag {
fn from_iter<T: IntoIterator<Item = RobotsTag>>(iter: T) -> Self {
Self(iter.into_iter().collect())
}
}
28 changes: 28 additions & 0 deletions rama-http/src/headers/x_robots_tag_components/custom_rule.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
use crate::headers::util::value_string::HeaderValueString;

#[derive(Clone, Debug, Eq, PartialEq)]
pub(super) struct CustomRule {
key: HeaderValueString,
value: Option<HeaderValueString>,
}

impl CustomRule {
pub(super) fn as_tuple(&self) -> (&HeaderValueString, &Option<HeaderValueString>) {
(&self.key, &self.value)
}
}

impl From<HeaderValueString> for CustomRule {
fn from(key: HeaderValueString) -> Self {
Self { key, value: None }
}
}

impl From<(HeaderValueString, HeaderValueString)> for CustomRule {
fn from(key_value: (HeaderValueString, HeaderValueString)) -> Self {
Self {
key: key_value.0,
value: Some(key_value.1),
}
}
}
Loading