-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 382/ support X-Robots-Tag as a typed http header XRobotsTag #393
base: main
Are you sure you want to change the base?
Conversation
1a46b37
to
f3c6d97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad start. Do take your time to complete it and add sufficient start. But great start indeed. Added a couple of remarks to help you with some more guidance.
@@ -131,6 +131,9 @@ pub mod header { | |||
"x-real-ip", | |||
]; | |||
|
|||
// non-std web-crawler info headers | |||
static_header!["x-robots-tag",]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_header!["x-robots-tag",]; | |
// | |
// More infornation at | |
// <https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Robots-Tag>. | |
static_header!["x-robots-tag"]; |
|
||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub struct Element { | ||
bot_name: Option<String>, // or `rama_ua::UserAgent` ??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can copy https://github.com/hyperium/headers/blob/master/src/util/value_string.rs to https://github.com/plabayo/rama/tree/main/rama-http/src/headers/util and use that one here, that's also what the UserAgent typed header uses: https://docs.rs/headers/0.4.0/src/headers/common/user_agent.rs.html#43
use std::str::FromStr; | ||
|
||
#[derive(Debug, Clone, PartialEq, Eq)] | ||
pub struct Element { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending how you structure it, this actually has to be either:
struct Element { bot_name: Option<HeaderValueString>, rules: Vec<Rule> }
or
enum Element {
BotName(HeaderValueString),
Rule(Rule),
}
Because when a botname is mentioned it applies to all rules that follow it, until another botname is mentioned
fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
let (bot_name, indexing_rule) = match Rule::from_str(s) { | ||
Ok(rule) => (None, Ok(rule)), | ||
Err(e) => match *s.split(":").map(str::trim).collect::<Vec<_>>().as_slice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to anyway modify the Element code, so this might dissapear, but as extra info, be careful with collecting stuff in a Vec, just for tmp reasons. Gives allocations for not much good. Instead better to split fixed into a tuple, or use the iterator directly .
MaxVideoPreview(Option<u32>), | ||
NoTranslate, | ||
NoImageIndex, | ||
UnavailableAfter(String), // "A date must be specified in a format such as RFC 822, RFC 850, or ISO 8601." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This best be validated and enforced in a typed manner :)
use std::str::FromStr; | ||
|
||
#[derive(Clone, Debug, Eq, PartialEq)] | ||
pub(super) enum Rule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget about the custom ones: noai, noimageai, SPC
} | ||
} | ||
|
||
impl TryFrom<&[&str]> for Rule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think it's worth implementing a public trait from this. You can just make simple private functions from this, and I would instead of collecting it into a vec just use the iterator's next method. You can then have a private fn for 1 arg en one with 2 arg.
Initial Implementation of
X-Robots-Tag
HeaderThis pull request introduces an initial implementation of the
X-Robots-Tag
header for therama-http
crate and closes #382Summary of Changes:
Header Registration:
x-robots-tag
header to the static headers list inrama-http-types/src/lib.rs
.Implementation of the Header:
XRobotsTag
struct that represents the header and implements therama_http::headers::Header
trait.Rule
enum to represent indexing rules such asnoindex
,nofollow
, andmax-snippet
. Complex rules likemax-image-preview
andunavailable_after
are also included.Element
struct to represent a rule optionally associated with a bot name.XRobotsTag
to iterate over its elements.File Structure:
rama-http/src/headers/x_robots_tag/
, which includes the following files:rule.rs
: Defines theRule
enum and parsing logic.element.rs
: Implements theElement
struct and its parsing/formatting logic.iterator.rs
: Provides an iterator forXRobotsTag
.mod.rs
: Combines and exposes the module’s functionality.Encoding and Decoding:
Header
trait, supporting CSV-style comma-separated values.Questions and Feedback Requested:
Code Structure:
x_robots_tag
) appropriate?Implementation Design:
Rule
andElement
structs?Vec<Element>
forXRobotsTag
suitable, or are there optimizations to consider?Edge Cases and Standards:
Testing:
I look forward to your feedback and suggestions for improvement. Thank you!