fix: update parse tests. other improvements.

dreulavelle · Mar 27, 2024 · b3a92fe · b3a92fe
1 parent c4ab21c
commit b3a92fe
Show file tree

Hide file tree

Showing 8 changed files with 187 additions and 107 deletions.
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -32,4 +32,4 @@ jobs:
       run: poetry publish --build
       env:
         POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
-      continue-on-error: true
+      continue-on-error: true  # Still want coverage tests to be ran even if publishing fails
diff --git a/README.md b/README.md
@@ -1,11 +1,21 @@
-<center> 
+<div align="center">
 
-# Rank Torrent Name (RTN)
+<h1>Rank Torrent Name (RTN)</h1>
 
-[![PyPI version](https://badge.fury.io/py/rank-torrent-name.svg)](https://badge.fury.io/py/rank-torrent-name) ![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/dreulavelle/rank-torrent-name/battery.yml?style=flat) ![GitHub License](https://img.shields.io/github/license/dreulavelle/rank-torrent-name)
- [![Coverage Status](https://coveralls.io/repos/github/dreulavelle/rank-torrent-name/badge.svg?branch=main)](https://coveralls.io/github/dreulavelle/rank-torrent-name?branch=main)
+<a href="https://codecov.io/gh/dreulavelle/rank-torrent-name"> 
+ <img src="https://codecov.io/gh/dreulavelle/rank-torrent-name/graph/badge.svg?token=V9S89GSUKM"/> 
+</a>
 
-</center>
+<a href="https://badge.fury.io/py/rank-torrent-name">
+    <img src="https://badge.fury.io/py/rank-torrent-name.svg" alt="PyPI version" />
+</a>
+
+<img src="https://img.shields.io/github/actions/workflow/status/dreulavelle/rank-torrent-name/battery.yml" alt="GitHub Actions Workflow Status" />
+
+<img src="https://img.shields.io/github/license/dreulavelle/rank-torrent-name" alt="GitHub License" />
+
+</div>
+<br>
 
 **Rank Torrent Name (RTN)** is a Python library designed to parse and rank torrent names based on customizable criteria. It allows users to define their preferences for filtering and ranking torrents, providing a detailed analysis of each torrent's metadata. RTN is perfect for automating the selection of torrents based on quality, resolution, audio, and more.
 
@@ -79,10 +89,10 @@ We cover a lot already, so users are able to add their own custom regex patterns
 #### Understanding Fetch and Enable:
 
 - `fetch`: Determines if RTN should consider a torrent for downloading based on the attribute. True means RTN will fetch torrents matching this criterion.
-- `enable`: Controls whether the custom rank value is used in the overall ranking calculation. Disabling it reverts to the default ranking for that attribute.
+- `enable`: Controls whether the custom rank value is used in the overall ranking calculation. Disabling it reverts to using the ranking model you set instead. This is useful for toggling custom ranks on and off from a users perspective.
 - `rank`: Sets the rank at which that item is graded with.
 
-For instance, if we detect a title is **4K** or **2160p** then we use the `uhd` ranking, and add **+120** points. The same goes for the rest of the strings in `custom_ranks`.
+For example, if we detect a title is **4K** or **2160p** then we use the `uhd` ranking, and add **+120** points. The same goes for the rest of the strings in `custom_ranks`.
 
 Settings can be easily adjusted at runtime if needed. To enable or disable a specific rank dynamically:
 
@@ -105,7 +115,7 @@ torrent = rtn.rank("Example.Movie.2020.1080p.BluRay.x264-Example", "infohash1234
 3. **Inspecting the Torrent Object:** The returned `Torrent` object includes parsed data and a rank. Access its properties to understand its quality:
 
 ```python
-print(f"Title: {torrent.parsed_data.parsed_title}, Rank: {torrent.rank}")
+print(f"Title: {torrent.data.parsed_title}, Rank: {torrent.rank}")
 ```
 
 ### Sorting Torrents
@@ -125,7 +135,7 @@ A `Torrent` object encapsulates metadata about a torrent, such as its title, par
 Torrent(
     raw_title="Example.Movie.2020.1080p.BluRay.x264-Example",
     infohash="infohash123456",
-    parsed_data=ParsedData(parsed_title='Example Movie', ...),
+    data=ParsedData(parsed_title='Example Movie', ...),
     fetch=True,
     rank=150,
     lev_ratio=0.95
@@ -269,9 +279,9 @@ Using the example above:
 from RTN import parse
 parsed = parse("Example.Movie.2020.1080p.BluRay.x264-Example")
 
-print(parsed.parsed_data.raw_title)    # Output: "Example.Movie.2020.1080p.BluRay.x264-Example"
-print(parsed.parsed_data.parsed_title) # Output: "Example Movie"
-print(parsed.parsed_data.year)         # Output: [2020]
+print(parsed.data.raw_title)    # Output: "Example.Movie.2020.1080p.BluRay.x264-Example"
+print(parsed.data.parsed_title) # Output: "Example Movie"
+print(parsed.data.year)         # Output: [2020]
 ```
 
 > :warning: We also set **coherent_types** to `True` from the PTN data that get's combined with RTN parsed metadata.
@@ -336,12 +346,14 @@ rtn = RTN(settings=settings, ranking_model=DefaultRanking())
 
 # Example usage
 for torrent in sorted_torrents:
-    print(f"Title: {torrent.parsed_data.parsed_title}, Infohash: {torrent.infohash}, Rank: {torrent.rank}")
+    print(f"Title: {torrent.data.parsed_title}, Infohash: {torrent.infohash}, Rank: {torrent.rank}")
 ```
 
 # ParsedData Structure
 
-Here is all of the attributes of `parsed_data` along with their default values:
+Here is all of the attributes of `data` from the `Torrent` object, along with their default values.
+
+This is accessible at `torrent.data` in the `Torrent` object. Ex: `torrent.data.resolution`
 
 ```py
 class ParsedData(BaseModel):

diff --git a/RTN/parser.py b/RTN/parser.py
@@ -19,7 +19,7 @@ class Torrent(BaseModel):
     Attributes:
         `raw_title` (str): The original title of the torrent.
         `infohash` (str): The SHA-1 hash identifier of the torrent.
-        `parsed_data` (ParsedData): Metadata extracted from the torrent title including PTN parsing and additional extras.
+        `data` (ParsedData): Metadata extracted from the torrent title including PTN parsing and additional extras.
         `fetch` (bool): Indicates whether the torrent meets the criteria for fetching based on user settings.
         `rank` (int): The computed ranking score of the torrent based on user-defined preferences.
         `lev_ratio` (float): The Levenshtein ratio comparing the parsed title and the raw title for similarity.
@@ -30,18 +30,16 @@ class Torrent(BaseModel):
 
     raw_title: str
     infohash: str
-    parsed_data: ParsedData
+    data: ParsedData
     fetch: bool = False
     rank: int = 0
     lev_ratio: float = 0.0
 
     @validator("raw_title", "infohash")
     def validate_strings(cls, v):
         """Ensures raw_title and infohash are strings."""
-        if not v:
-            raise ValueError("Value cannot be empty.")
-        if not isinstance(v, str):
-            raise ValueError("Value must be a string.")
+        if not v or not isinstance(v, str):
+            raise TypeError("The title and infohash must be non-empty strings.")
         return v
 
     @validator("infohash")
@@ -92,12 +90,10 @@ def rank(self, raw_title: str, infohash: str) -> Torrent:
             raise ValueError("The infohash must be a valid SHA-1 hash and 40 characters in length.")
 
         parsed_data = parse(raw_title)
-        if not parsed_data:
-            raise ValueError(f"Failed to parse the title: {raw_title}")
         return Torrent(
             raw_title=raw_title,
             infohash=infohash,
-            parsed_data=parsed_data,
+            data=parsed_data,
             fetch=check_fetch(parsed_data, self.settings),
             rank=get_rank(parsed_data, self.settings, self.ranking_model),
             lev_ratio=Levenshtein.ratio(parsed_data.parsed_title.lower(), raw_title.lower()),
@@ -117,11 +113,11 @@ def parse(raw_title: str) -> ParsedData:
     if not raw_title or not isinstance(raw_title, str):
         raise TypeError("The input title must be a non-empty string.")
 
-    parsed_dict: dict[str, Any] = PTN.parse(raw_title, coherent_types=True)  # Imagine this returns a dict
-    extras: dict[str, Any] = parse_extras(raw_title)  # Returns additional fields as a dict
-    full_data = {**parsed_dict, **extras}  # Merge PTN parsed data with extras
-    full_data["raw_title"] = raw_title  # Add the raw title to the data
-    full_data["parsed_title"] = parsed_dict.get("title")  # Add the parsed title to the data
+    parsed_dict: dict[str, Any] = PTN.parse(raw_title, coherent_types=True)
+    extras: dict[str, Any] = parse_extras(raw_title)
+    full_data = {**parsed_dict, **extras}  # Merge PTN parsed data with RTN extras.
+    full_data["raw_title"] = raw_title
+    full_data["parsed_title"] = parsed_dict.get("title")
     return ParsedData(**full_data)
 
 
@@ -130,20 +126,21 @@ def parse_chunk(chunk: List[str]) -> List[ParsedData]:
     return [parse(title) for title in chunk]
 
 
-def batch_parse(titles: List[str], chunk_size: int = 50) -> List[ParsedData]:
+def batch_parse(titles: List[str], chunk_size: int = 50, max_workers: int = 4) -> List[ParsedData]:
     """
     Parses a list of torrent titles in batches for improved performance.
 
     Args:
         titles (List[str]): A list of torrent titles to parse.
         chunk_size (int): The number of titles to process in each batch.
+        max_workers (int): The maximum number of worker threads to use for parsing.
 
     Returns:
         List[ParsedData]: A list of ParsedData objects for each title.
     """
     chunks = [titles[i : i + chunk_size] for i in range(0, len(titles), chunk_size)]
     parsed_data = []
-    with ThreadPoolExecutor() as executor:
+    with ThreadPoolExecutor(max_workers=max_workers) as executor:
         future_to_chunk = {executor.submit(parse_chunk, chunk): chunk for chunk in chunks}
         for future in as_completed(future_to_chunk):
             chunk_result = future.result()
@@ -163,10 +160,12 @@ def title_match(correct_title: str, raw_title: str, threshold: float = 0.9) -> b
     Returns:
         bool: True if the titles are similar above the specified threshold; False otherwise.
     """
-    if not correct_title or not raw_title:
+    if not (correct_title or raw_title):
         raise ValueError("Both titles must be provided.")
     if not isinstance(correct_title, str) or not isinstance(raw_title, str):
         raise TypeError("Both titles must be strings.")
+    if not isinstance(threshold, (int, float)) or not 0 <= threshold <= 1:
+        raise ValueError("The threshold must be a float between 0 and 1.")
     return Levenshtein.ratio(correct_title.lower(), raw_title.lower()) >= threshold
 
 

diff --git a/RTN/patterns.py b/RTN/patterns.py
@@ -7,7 +7,7 @@ def compile_patterns(patterns):
     return [regex.compile(pattern, regex.IGNORECASE) for pattern in patterns]
 
 
-# Pattern for identifying unwanted quality. This will set `parsed_data.fetch`.
+# Pattern for identifying unwanted quality. This will set `data.fetch`.
 IS_TRASH_COMPILED = compile_patterns(
     [
         r"\b(?:H[DQ][ .-]*)?CAM(?:H[DQ])?(?:[ .-]*Rip)?\b",

diff --git a/RTN/ranker.py b/RTN/ranker.py
@@ -3,12 +3,12 @@
 from .models import BaseRankingModel, ParsedData, SettingsModel
 
 
-def get_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
+def get_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
     """
     Calculate the ranking of the given parsed data.
 
     Parameters:
-        parsed_data (ParsedData): The parsed data object containing information about the torrent title.
+        data (ParsedData): The parsed data object containing information about the torrent title.
         settings (SettingsModel): The user settings object containing custom ranking models.
         rank_model (BaseRankingModel): The base ranking model used for calculating the ranking.
 
@@ -19,47 +19,47 @@ def get_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseR
         ValueError: If the parsed data is empty.
         TypeError: If the parsed data is not a ParsedData object.
     """
-    if not parsed_data:
+    if not data:
         raise ValueError("Parsed data cannot be empty.")
-    if not isinstance(parsed_data, ParsedData):
+    if not isinstance(data, ParsedData):
         raise TypeError("Parsed data must be an instance of ParsedData.")
 
-    rank: int = calculate_resolution_rank(parsed_data, settings, rank_model)
-    rank += calculate_quality_rank(parsed_data, settings, rank_model)
-    rank += calculate_codec_rank(parsed_data, settings, rank_model)
-    rank += calculate_audio_rank(parsed_data, settings, rank_model)
-    rank += calculate_other_ranks(parsed_data, settings, rank_model)
-    rank += calculate_preferred(parsed_data, settings)
-    if parsed_data.repack:
+    rank: int = calculate_resolution_rank(data, settings, rank_model)
+    rank += calculate_quality_rank(data, settings, rank_model)
+    rank += calculate_codec_rank(data, settings, rank_model)
+    rank += calculate_audio_rank(data, settings, rank_model)
+    rank += calculate_other_ranks(data, settings, rank_model)
+    rank += calculate_preferred(data, settings)
+    if data.repack:
         rank += rank_model.repack if not settings.custom_ranks["repack"].enable else settings.custom_ranks["repack"].rank
-    if parsed_data.proper:
+    if data.proper:
         rank += rank_model.proper if not settings.custom_ranks["proper"].enable else settings.custom_ranks["proper"].rank
-    if parsed_data.remux:
+    if data.remux:
         rank += rank_model.remux if not settings.custom_ranks["remux"].enable else settings.custom_ranks["remux"].rank
-    if parsed_data.is_multi_audio:
+    if data.is_multi_audio:
         rank += rank_model.dubbed if not settings.custom_ranks["dubbed"].enable else settings.custom_ranks["dubbed"].rank
-    if parsed_data.is_multi_subtitle:
+    if data.is_multi_subtitle:
         rank += rank_model.subbed if not settings.custom_ranks["subbed"].enable else settings.custom_ranks["subbed"].rank
     return rank
 
 
-def calculate_preferred(parsed_data: ParsedData, settings: SettingsModel) -> int:
+def calculate_preferred(data: ParsedData, settings: SettingsModel) -> int:
     """Calculate the preferred ranking of a given parsed data."""
     if not settings.preferred or all(pattern is None for pattern in settings.preferred):
         return 0
     return (
         5000
-        if any(pattern.search(parsed_data.raw_title) for pattern in settings.preferred if pattern)  # type: ignore
+        if any(pattern.search(data.raw_title) for pattern in settings.preferred if pattern)  # type: ignore
         else 0
     )
 
 
-def calculate_resolution_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
+def calculate_resolution_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
     """Calculate the resolution ranking of the given parsed data."""
-    if not parsed_data.resolution:
+    if not data.resolution:
         return 0
 
-    resolution: str = parsed_data.resolution[0]
+    resolution: str = data.resolution[0]
     match resolution:
         case "4K":
             return rank_model.uhd if not settings.custom_ranks["uhd"].enable else settings.custom_ranks["uhd"].rank
@@ -77,12 +77,12 @@ def calculate_resolution_rank(parsed_data: ParsedData, settings: SettingsModel,
             return 0
 
 
-def calculate_quality_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
+def calculate_quality_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
     """Calculate the quality ranking of the given parsed data."""
-    if not parsed_data.quality:
+    if not data.quality:
         return 0
 
-    quality = parsed_data.quality[0]
+    quality = data.quality[0]
     match quality:
         case "WEB-DL":
             return rank_model.webdl if not settings.custom_ranks["webdl"].enable else settings.custom_ranks["webdl"].rank
@@ -100,12 +100,12 @@ def calculate_quality_rank(parsed_data: ParsedData, settings: SettingsModel, ran
             return 0
 
 
-def calculate_codec_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
+def calculate_codec_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
     """Calculate the codec ranking of the given parsed data."""
-    if not parsed_data.codec:
+    if not data.codec:
         return 0
 
-    codec = parsed_data.codec[0]
+    codec = data.codec[0]
     match codec:
         case "Xvid" | "H.263" | "VC-1" | "MPEG-2":
             return -1000
@@ -119,12 +119,12 @@ def calculate_codec_rank(parsed_data: ParsedData, settings: SettingsModel, rank_
             return 0
 
 
-def calculate_audio_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
+def calculate_audio_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
     """Calculate the audio ranking of the given parsed data."""
-    if not parsed_data.audio:
+    if not data.audio:
         return 0
 
-    audio_format: str = parsed_data.audio[0]
+    audio_format: str = data.audio[0]
 
     # Remove any unwanted audio formats. We dont support surround sound formats yet.
     # These also make it harder to compare audio formats.
@@ -188,31 +188,31 @@ def calculate_audio_rank(parsed_data: ParsedData, settings: SettingsModel, rank_
             return 0
 
 
-def calculate_other_ranks(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
+def calculate_other_ranks(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
     """Calculate all the other rankings of the given parsed data."""
-    if not ["bitDepth"] and not parsed_data.hdr and not parsed_data.is_complete:
+    if not ["bitDepth"] and not data.hdr and not data.is_complete:
         return 0
 
     total_rank = 0
-    if parsed_data.bitDepth and parsed_data.bitDepth[0] > 8:
+    if data.bitDepth and data.bitDepth[0] > 8:
         total_rank += 2
-    if parsed_data.hdr:
-        if parsed_data.hdr == "HDR":
+    if data.hdr:
+        if data.hdr == "HDR":
             total_rank += settings.custom_ranks["hdr"].rank if settings.custom_ranks["hdr"].enable else rank_model.hdr
-        elif parsed_data.hdr == "HDR10+":
+        elif data.hdr == "HDR10+":
             total_rank += (
                 settings.custom_ranks["hdr10"].rank if settings.custom_ranks["hdr10"].enable else rank_model.hdr10
             )
-        elif parsed_data.hdr == "DV":
+        elif data.hdr == "DV":
             total_rank += (
                 settings.custom_ranks["dolby_video"].rank
                 if settings.custom_ranks["dolby_video"].enable
                 else rank_model.dolby_video
             )
-    if parsed_data.is_complete:
+    if data.is_complete:
         total_rank += 100
-    if parsed_data.season:
-        total_rank += 100 * len(parsed_data.season)
-    if parsed_data.episode:
-        total_rank += 10 * len(parsed_data.episode)
+    if data.season:
+        total_rank += 100 * len(data.season)
+    if data.episode:
+        total_rank += 10 * len(data.episode)
     return total_rank
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "rank-torrent-name"
-version = "0.1.3"
+version = "0.1.4"
 description = "Parse Torrents using PTN and Rank them according to your preferences!"
 authors = ["Spoked <dreu.lavelle@gmail.com>"]
 license = "MIT"