Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 382/ support X-Robots-Tag as a typed http header XRobotsTag #393

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hafihaf123
Copy link
Contributor

@hafihaf123 hafihaf123 commented Jan 12, 2025

Initial Implementation of X-Robots-Tag Header

This pull request introduces an initial implementation of the X-Robots-Tag header for the rama-http crate and closes #382

Summary of Changes:

  1. Header Registration:

    • Added the x-robots-tag header to the static headers list in rama-http-types/src/lib.rs.
  2. Implementation of the Header:

    • Created a XRobotsTag struct that represents the header and implements the rama_http::headers::Header trait.
    • Defined a Rule enum to represent indexing rules such as noindex, nofollow, and max-snippet. Complex rules like max-image-preview and unavailable_after are also included.
    • Designed an Element struct to represent a rule optionally associated with a bot name.
    • Added an iterator for XRobotsTag to iterate over its elements.
  3. File Structure:

    • The implementation resides in a new module at rama-http/src/headers/x_robots_tag/, which includes the following files:
      • rule.rs: Defines the Rule enum and parsing logic.
      • element.rs: Implements the Element struct and its parsing/formatting logic.
      • iterator.rs: Provides an iterator for XRobotsTag.
      • mod.rs: Combines and exposes the module’s functionality.
  4. Encoding and Decoding:

    • Encoding and decoding are implemented using the Header trait, supporting CSV-style comma-separated values.

Questions and Feedback Requested:

  1. Code Structure:

    • Is the location of the new module (x_robots_tag) appropriate?
    • Does the filename structure and organization align with project conventions?
  2. Implementation Design:

    • Are there any improvements or alternative approaches you’d suggest for the Rule and Element structs?
    • Is the use of Vec<Element> for XRobotsTag suitable, or are there optimizations to consider?
  3. Edge Cases and Standards:

    • Are there additional rules, formats, or edge cases that should be covered?

Testing:

  • For now, the module is not tested, I will add the tests after the code structure becomes stable.

I look forward to your feedback and suggestions for improvement. Thank you!

@hafihaf123 hafihaf123 force-pushed the issue-382/x-robots-tag branch from 1a46b37 to f3c6d97 Compare January 12, 2025 22:01
Copy link
Member

@GlenDC GlenDC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad start. Do take your time to complete it and add sufficient start. But great start indeed. Added a couple of remarks to help you with some more guidance.

@@ -131,6 +131,9 @@ pub mod header {
"x-real-ip",
];

// non-std web-crawler info headers
static_header!["x-robots-tag",];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_header!["x-robots-tag",];
//
// More infornation at
// <https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Robots-Tag>.
static_header!["x-robots-tag"];


#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Element {
bot_name: Option<String>, // or `rama_ua::UserAgent` ???
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::str::FromStr;

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Element {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending how you structure it, this actually has to be either:

struct Element { bot_name: Option<HeaderValueString>, rules: Vec<Rule> }

or

enum Element {
    BotName(HeaderValueString),
    Rule(Rule),
}

Because when a botname is mentioned it applies to all rules that follow it, until another botname is mentioned

fn from_str(s: &str) -> Result<Self, Self::Err> {
let (bot_name, indexing_rule) = match Rule::from_str(s) {
Ok(rule) => (None, Ok(rule)),
Err(e) => match *s.split(":").map(str::trim).collect::<Vec<_>>().as_slice() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to anyway modify the Element code, so this might dissapear, but as extra info, be careful with collecting stuff in a Vec, just for tmp reasons. Gives allocations for not much good. Instead better to split fixed into a tuple, or use the iterator directly .

MaxVideoPreview(Option<u32>),
NoTranslate,
NoImageIndex,
UnavailableAfter(String), // "A date must be specified in a format such as RFC 822, RFC 850, or ISO 8601."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This best be validated and enforced in a typed manner :)

use std::str::FromStr;

#[derive(Clone, Debug, Eq, PartialEq)]
pub(super) enum Rule {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget about the custom ones: noai, noimageai, SPC

}
}

impl TryFrom<&[&str]> for Rule {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think it's worth implementing a public trait from this. You can just make simple private functions from this, and I would instead of collecting it into a vec just use the iterator's next method. You can then have a private fn for 1 arg en one with 2 arg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support X-Robots-Tag as a typed http header XRobotsTag
2 participants