feat: add more tests. more tweaks. add batch parsing.

dreulavelle · Mar 27, 2024 · 23d8b2d · 23d8b2d
1 parent 9fcf5f8
commit 23d8b2d
Show file tree

Hide file tree

Showing 15 changed files with 429 additions and 249 deletions.
diff --git a/.github/ISSUE_TEMPLATE/---bug-report.yml b/.github/ISSUE_TEMPLATE/---bug-report.yml
@@ -1,6 +1,9 @@
 name: "\U0001F41E Bug Report"
-labels: ["kind/bug", "status/triage"]
 description: "Rank Torrent Name (RTN) not working the way it is documented?"
+title: "[Bug]: "
+labels: ["kind/bug", "status/triage"]
+assignees:
+  - dreulavelle
 
 body:
   - type: markdown

diff --git a/.github/ISSUE_TEMPLATE/---feature-request.yml b/.github/ISSUE_TEMPLATE/---feature-request.yml
@@ -1,6 +1,9 @@
 name: "\U0001F381 Feature Request"
-labels: ["kind/feature", "status/triage"]
 description: "Did you find bugs, errors, or anything that isn't straightforward in the documentation?"
+title: "[Feature]: "
+labels: ["kind/feature", "status/triage"]
+assignees:
+  - dreulavelle
 
 body:
   - type: markdown

diff --git a/.github/ISSUE_TEMPLATE/---maintainers.yml b/.github/ISSUE_TEMPLATE/---maintainers.yml
@@ -1,6 +1,9 @@
 name: "\U0001F41E Request to Help in Development of RTN"
-labels: ["kind/maintainer", "status/triage"]
 description: "Want to help in the development of Rank Torrent Name (RTN)?"
+title: "[Help]: "
+labels: ["kind/maintainer", "status/triage"]
+assignees:
+  - dreulavelle
 
 body:
   - type: markdown

diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,9 @@
+# Pull Request Check List
+
+Resolves: #issue-number-here
+
+- [ ] Added **tests** for changed code.
+- [ ] Updated **documentation** for changed code.
+
+## Description:
+
diff --git a/.github/workflows/PULL_REQUEST_TEMPLATE.md b/.github/workflows/PULL_REQUEST_TEMPLATE.md
diff --git a/README.md b/README.md
@@ -131,35 +131,6 @@ Torrent(
     lev_ratio=0.95
 )
 ```
-## Torrent Parser
-
-You can also parse a torrent title similar to how PTN works. This is an enhanced version of PTN that combines RTN's parsing as well. This also includes enhanced episode parsing as well that covers a much better range of titles.
-
-Using the example above:
-
-```py
-from RTN import parse
-parsed = parse("Example.Movie.2020.1080p.BluRay.x264-Example")
-
-print(parsed.parsed_title) # Output: "Example Movie"
-print(parsed.year)         # Output: [2020]
-```
-
-We also set **coherent_types** to `True` from the PTN data that get's combined with RTN parsed metadata.
-
-## Checking Title Similarity
-
-Sometimes, you might just want to check if two titles match closely enough, without going through the entire ranking process. RTN provides a simple function, title_match, for this purpose:
-
-```py
-from RTN import title_match
-
-# Check if two titles are similar above a threshold of 0.9
-match = title_match("Correct Movie Title 2020", "Correct Movie Title (2020)")
-print(match)  # Output: True if similarity is above 0.9, otherwise False
-```
-
-This functionality is especially useful when you have a list of potential titles and want to find the best match for a given reference title.
 
 ## Understanding SettingsModel and RankingModel
 
@@ -286,26 +257,76 @@ Keep in mind that these are explicitly set within RTN and are needed in order fo
 
 Create as many `SettingsModel` and `RankingModel` as you like to use anywhere in your code. They are mean't to be used as a way to version settings for your users. 
 
-## Real World Example
+# Extras
+
+## Torrent Parser
+
+You can also parse a torrent title similar to how PTN works. This is an enhanced version of PTN that combines RTN's parsing as well. This also includes enhanced episode parsing as well that covers a much better range of titles.
+
+Using the example above:
+
+```py
+from RTN import parse
+parsed = parse("Example.Movie.2020.1080p.BluRay.x264-Example")
+
+print(parsed.parsed_data.raw_title)    # Output: "Example.Movie.2020.1080p.BluRay.x264-Example"
+print(parsed.parsed_data.parsed_title) # Output: "Example Movie"
+print(parsed.parsed_data.year)         # Output: [2020]
+```
+
+> :warning: We also set **coherent_types** to `True` from the PTN data that get's combined with RTN parsed metadata.
+> This just ensures that all the types are uniform. **Everything is either a list of string or int's, or it's a boolean.**
+
+## Checking Title Similarity
+
+Sometimes, you might just want to check if two titles match closely enough, without going through the entire ranking process. RTN provides a simple function, title_match, for this purpose:
+
+```py
+from RTN import title_match
+
+# Check if two titles are similar above a threshold of 0.9
+match = title_match("Correct Movie Title 2020", "Correct Movie Title (2020)")
+print(match)  # Output: True if similarity is above 0.9, otherwise False
+```
+
+This functionality is especially useful when you have a list of potential titles and want to find the best match for a given reference title.
+
+## Trash Check
+
+Maybe you just want to use our own garbage collector to weed out bad titles in your current scraping setup?
+
+```py
+from RTN import check_trash
+
+if check_trash(raw_title):
+    # You can safely remove any title or item from being scraped if this returns True!
+    ...
+```
+
+# Real World Example
 
 Here is a crude example of how you could use RTN in scraping.
 
 ```py
 from RTN import RTN, Torrent, DefaultRanking
 
-# Assuming 'settings' is defined somewhere and passed correctly
+# Assuming 'settings' is defined somewhere and passed correctly.
 rtn = RTN(settings=settings, ranking_model=DefaultRanking())
 ...
-# Define some function for scraping for results..
+# Define some function for scraping for results from some API.
     if response.ok:
         torrents = set()
         for stream in response.streams:
             if not stream.infohash or not title_match(correct_title, stream.title):
-                # Skip results that don't match the query
+                # Skip results that don't match the query.
+                # We want to do this first to weed out torrents
+                # that are below the 90% match criteria. (Default is 90%)
                 continue
             torrent: Torrent = rtn.rank(stream.title, stream.infohash)
             if torrent and torrent.fetch:
-                # Skip trash torrents by checking torrent.fetch
+                # Skip trash torrents by checking `torrent.fetch`.
+                # If torrent.fetch is True, then it's a good torrent,
+                # as considered by your ranking profile and settings model.
                 torrents.add(torrent)
 
         # Sort the list of torrents based on their rank in descending order
@@ -318,6 +339,46 @@ for torrent in sorted_torrents:
     print(f"Title: {torrent.parsed_data.parsed_title}, Infohash: {torrent.infohash}, Rank: {torrent.rank}")
 ```
 
+# ParsedData Structure
+
+Here is all of the attributes of `parsed_data` along with their default values:
+
+```py
+class ParsedData(BaseModel):
+    """Parsed data model for a torrent title."""
+
+    raw_title: str
+    parsed_title: str
+    fetch: bool = False
+    is_4k: bool = False
+    is_multi_audio: bool = False
+    is_multi_subtitle: bool = False
+    is_complete: bool = False
+    year: List[int] = []
+    resolution: List[str] = []
+    quality: List[str] = []
+    season: List[int] = []
+    episode: List[int] = []
+    codec: List[str] = []
+    audio: List[str] = []
+    subtitles: List[str] = []
+    language: List[str] = []
+    bitDepth: List[int] = []
+    hdr: str | bool = False
+    proper: bool = False
+    repack: bool = False
+    remux: bool = False
+    upscaled: bool = False
+    remastered: bool = False
+    directorsCut: bool = False
+    extended: bool = False
+    excess: list = []
+```
+
+This will continue to grow though as we expand on functionality, so keep checking back for this list!
+
+> :warning: Don't see something you want in the list? Submit a [Feature Request](https://github.com/dreulavelle/rank-torrent-name/issues/new?assignees=dreulavelle&labels=kind%2Ffeature%2Cstatus%2Ftriage&projects=&template=---feature-request.yml) to have it added!
+
 ## Contributing
 
 Contributions to RTN are welcomed! Feel free to submit pull requests or open issues to suggest features or report bugs. As we grow, more features will be coming to RTN, there's already a lot planned!

diff --git a/RTN/__init__.py b/RTN/__init__.py
@@ -1,13 +1,14 @@
 from .fetch import check_fetch, check_trash
 from .models import BaseRankingModel, DefaultRanking, ParsedData, SettingsModel
-from .parser import RTN, Torrent, parse, sort, title_match
+from .parser import RTN, Torrent, batch_parse, parse, sort, title_match
 from .patterns import parse_extras
 from .ranker import get_rank
 
 __all__ = [
     "RTN",
     "Torrent",
     "parse",
+    "batch_parse",
     "get_rank",
     "check_fetch",
     "check_trash",

diff --git a/RTN/fetch.py b/RTN/fetch.py
@@ -1,19 +1,21 @@
 import regex
 
 from .models import ParsedData, SettingsModel
-from .patterns import TRASH_COMPILED
+from .patterns import IS_TRASH_COMPILED
 
 
 def check_trash(raw_title: str) -> bool:
     """Check if the title contains unwanted patterns."""
     if not raw_title or not isinstance(raw_title, str):
         raise TypeError("The input title must be a non-empty string.")
-    return not any(pattern.search(raw_title) for pattern in TRASH_COMPILED)
+    # True if we find any of the trash patterns in the title.
+    # You can safely remove any title from being scraped if this returns True!
+    return any(pattern.search(raw_title) for pattern in IS_TRASH_COMPILED)
 
 
 def check_fetch(data: ParsedData, settings: SettingsModel) -> bool:
     """Check user settings and unwanted quality to determine if torrent should be fetched."""
-    if not check_trash(data.raw_title):
+    if check_trash(data.raw_title):
         return False
     if settings.require and any(
         pattern.search(data.raw_title) for pattern in settings.require if pattern  # type: ignore

diff --git a/RTN/models.py b/RTN/models.py
@@ -42,36 +42,41 @@ class BaseRankingModel(BaseModel):
     The ranking values are used to determine the quality of a media item based on its attributes.
 
     Attributes:
-        uhd (int): The ranking value for Ultra HD (4K) resolution.
-        fhd (int): The ranking value for Full HD (1080p) resolution.
-        hd (int): The ranking value for HD (720p) resolution.
-        sd (int): The ranking value for SD (480p) resolution.
-        bluray (int): The ranking value for Blu-ray quality.
-        hdr (int): The ranking value for HDR quality.
-        hdr10 (int): The ranking value for HDR10 quality.
-        dolby_video (int): The ranking value for Dolby video quality.
-        dts_x (int): The ranking value for DTS:X audio quality.
-        dts_hd (int): The ranking value for DTS-HD audio quality.
-        dts_hd_ma (int): The ranking value for DTS-HD Master Audio audio quality.
-        atmos (int): The ranking value for Dolby Atmos audio quality.
-        truehd (int): The ranking value for Dolby TrueHD audio quality.
-        ddplus (int): The ranking value for Dolby Digital Plus audio quality.
-        ac3 (int): The ranking value for AC3 audio quality.
-        aac (int): The ranking value for AAC audio quality.
-        remux (int): The ranking value for remux attribute.
-        webdl (int): The ranking value for web-dl attribute.
-        repack (int): The ranking value for repack attribute.
-        proper (int): The ranking value for proper attribute.
-        dubbed (int): The ranking value for dubbed attribute.
-        subbed (int): The ranking value for subbed attribute.
-        av1 (int): The ranking value for AV1 attribute.
+        `uhd` (int): The ranking value for Ultra HD (4K) resolution.
+        `fhd` (int): The ranking value for Full HD (1080p) resolution.
+        `hd` (int): The ranking value for HD (720p) resolution.
+        `sd` (int): The ranking value for SD (480p) resolution.
+        `bluray` (int): The ranking value for Blu-ray quality.
+        `hdr` (int): The ranking value for HDR quality.
+        `hdr10` (int): The ranking value for HDR10 quality.
+        `dolby_video` (int): The ranking value for Dolby video quality.
+        `dts_x` (int): The ranking value for DTS:X audio quality.
+        `dts_hd` (int): The ranking value for DTS-HD audio quality.
+        `dts_hd_ma` (int): The ranking value for DTS-HD Master Audio audio quality.
+        `atmos` (int): The ranking value for Dolby Atmos audio quality.
+        `truehd` (int): The ranking value for Dolby TrueHD audio quality.
+        `ddplus` (int): The ranking value for Dolby Digital Plus audio quality.
+        `ac3` (int): The ranking value for AC3 audio quality.
+        `aac` (int): The ranking value for AAC audio quality.
+        `remux` (int): The ranking value for remux attribute.
+        `webdl` (int): The ranking value for web-dl attribute.
+        `repack` (int): The ranking value for repack attribute.
+        `proper` (int): The ranking value for proper attribute.
+        `dubbed` (int): The ranking value for dubbed attribute.
+        `subbed` (int): The ranking value for subbed attribute.
+        `av1` (int): The ranking value for AV1 attribute.
+    
+    Note:
+        - The higher the ranking value, the better the quality of the media item.
+        - The default ranking values are set to 0, which means that the attribute does not affect the overall rank.
+        - Users can customize the ranking values based on their preferences and requirements by using inheritance.
     """
 
     # resolution
-    uhd: int = 0
-    fhd: int = 0
-    hd: int = 0
-    sd: int = 0
+    uhd: int = 0 # 4K
+    fhd: int = 0 # 1080p
+    hd: int = 0  # 720p
+    sd: int = 0  # 480p
     # quality
     bluray: int = 0
     hdr: int = 0
@@ -98,7 +103,7 @@ class BaseRankingModel(BaseModel):
 
 
 class DefaultRanking(BaseRankingModel):
-    """Default ranking model for users to use."""
+    """Default ranking model preset that should cover most common use cases."""
 
     uhd: int = 140
     fhd: int = 100
@@ -209,12 +214,11 @@ def compile_and_validate_patterns(cls, values: dict[str, Any]) -> dict[str, Any]
             compiled_patterns = []
             for pattern in raw_patterns:
                 if isinstance(pattern, str):
-                    # Compile the pattern, taking into account your custom syntax for options like case-sensitivity
-                    if pattern.startswith("/") and pattern.endswith("/i"):
+                    if pattern.startswith("/") and pattern.endswith("/i"): # case-insensitive
                         compiled_patterns.append(regex.compile(pattern[1:-2], regex.IGNORECASE))
-                    elif pattern.startswith("/") and pattern.endswith("/"):
+                    elif pattern.startswith("/") and pattern.endswith("/"): # case-sensitive
                         compiled_patterns.append(regex.compile(pattern[1:-1]))
-                    else:
+                    else: # case-insensitive by default
                         compiled_patterns.append(regex.compile(pattern, regex.IGNORECASE))
                 elif isinstance(pattern, regex.Pattern):
                     # Keep already compiled patterns as is
@@ -226,16 +230,3 @@ def compile_and_validate_patterns(cls, values: dict[str, Any]) -> dict[str, Any]
 
     class Config:
         arbitrary_types_allowed = True
-
-    def __getitem__(self, key: str) -> CustomRank:
-        """Allows direct access to custom rank settings."""
-        return self.custom_ranks.get(key, CustomRank())
-
-    def __setitem__(self, key: str, value: CustomRank):
-        """Enables setting custom rank settings."""
-        self.custom_ranks[key] = value
-
-    def __delitem__(self, key: str):
-        """Allows deletion of custom rank settings."""
-        if key in self.custom_ranks:
-            del self.custom_ranks[key]