Skip to content

Commit

Permalink
fix: update parse tests. other improvements.
Browse files Browse the repository at this point in the history
  • Loading branch information
Spoked authored and Spoked committed Mar 27, 2024
1 parent c4ab21c commit b3a92fe
Show file tree
Hide file tree
Showing 8 changed files with 187 additions and 107 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ jobs:
run: poetry publish --build
env:
POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_TOKEN }}
continue-on-error: true
continue-on-error: true # Still want coverage tests to be ran even if publishing fails
40 changes: 26 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@
<center>
<div align="center">

# Rank Torrent Name (RTN)
<h1>Rank Torrent Name (RTN)</h1>

[![PyPI version](https://badge.fury.io/py/rank-torrent-name.svg)](https://badge.fury.io/py/rank-torrent-name) ![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/dreulavelle/rank-torrent-name/battery.yml?style=flat) ![GitHub License](https://img.shields.io/github/license/dreulavelle/rank-torrent-name)
[![Coverage Status](https://coveralls.io/repos/github/dreulavelle/rank-torrent-name/badge.svg?branch=main)](https://coveralls.io/github/dreulavelle/rank-torrent-name?branch=main)
<a href="https://codecov.io/gh/dreulavelle/rank-torrent-name">
<img src="https://codecov.io/gh/dreulavelle/rank-torrent-name/graph/badge.svg?token=V9S89GSUKM"/>
</a>

</center>
<a href="https://badge.fury.io/py/rank-torrent-name">
<img src="https://badge.fury.io/py/rank-torrent-name.svg" alt="PyPI version" />
</a>

<img src="https://img.shields.io/github/actions/workflow/status/dreulavelle/rank-torrent-name/battery.yml" alt="GitHub Actions Workflow Status" />

<img src="https://img.shields.io/github/license/dreulavelle/rank-torrent-name" alt="GitHub License" />

</div>
<br>

**Rank Torrent Name (RTN)** is a Python library designed to parse and rank torrent names based on customizable criteria. It allows users to define their preferences for filtering and ranking torrents, providing a detailed analysis of each torrent's metadata. RTN is perfect for automating the selection of torrents based on quality, resolution, audio, and more.

Expand Down Expand Up @@ -79,10 +89,10 @@ We cover a lot already, so users are able to add their own custom regex patterns
#### Understanding Fetch and Enable:

- `fetch`: Determines if RTN should consider a torrent for downloading based on the attribute. True means RTN will fetch torrents matching this criterion.
- `enable`: Controls whether the custom rank value is used in the overall ranking calculation. Disabling it reverts to the default ranking for that attribute.
- `enable`: Controls whether the custom rank value is used in the overall ranking calculation. Disabling it reverts to using the ranking model you set instead. This is useful for toggling custom ranks on and off from a users perspective.
- `rank`: Sets the rank at which that item is graded with.

For instance, if we detect a title is **4K** or **2160p** then we use the `uhd` ranking, and add **+120** points. The same goes for the rest of the strings in `custom_ranks`.
For example, if we detect a title is **4K** or **2160p** then we use the `uhd` ranking, and add **+120** points. The same goes for the rest of the strings in `custom_ranks`.

Settings can be easily adjusted at runtime if needed. To enable or disable a specific rank dynamically:

Expand All @@ -105,7 +115,7 @@ torrent = rtn.rank("Example.Movie.2020.1080p.BluRay.x264-Example", "infohash1234
3. **Inspecting the Torrent Object:** The returned `Torrent` object includes parsed data and a rank. Access its properties to understand its quality:

```python
print(f"Title: {torrent.parsed_data.parsed_title}, Rank: {torrent.rank}")
print(f"Title: {torrent.data.parsed_title}, Rank: {torrent.rank}")
```

### Sorting Torrents
Expand All @@ -125,7 +135,7 @@ A `Torrent` object encapsulates metadata about a torrent, such as its title, par
Torrent(
raw_title="Example.Movie.2020.1080p.BluRay.x264-Example",
infohash="infohash123456",
parsed_data=ParsedData(parsed_title='Example Movie', ...),
data=ParsedData(parsed_title='Example Movie', ...),
fetch=True,
rank=150,
lev_ratio=0.95
Expand Down Expand Up @@ -269,9 +279,9 @@ Using the example above:
from RTN import parse
parsed = parse("Example.Movie.2020.1080p.BluRay.x264-Example")

print(parsed.parsed_data.raw_title) # Output: "Example.Movie.2020.1080p.BluRay.x264-Example"
print(parsed.parsed_data.parsed_title) # Output: "Example Movie"
print(parsed.parsed_data.year) # Output: [2020]
print(parsed.data.raw_title) # Output: "Example.Movie.2020.1080p.BluRay.x264-Example"
print(parsed.data.parsed_title) # Output: "Example Movie"
print(parsed.data.year) # Output: [2020]
```

> :warning: We also set **coherent_types** to `True` from the PTN data that get's combined with RTN parsed metadata.
Expand Down Expand Up @@ -336,12 +346,14 @@ rtn = RTN(settings=settings, ranking_model=DefaultRanking())

# Example usage
for torrent in sorted_torrents:
print(f"Title: {torrent.parsed_data.parsed_title}, Infohash: {torrent.infohash}, Rank: {torrent.rank}")
print(f"Title: {torrent.data.parsed_title}, Infohash: {torrent.infohash}, Rank: {torrent.rank}")
```

# ParsedData Structure

Here is all of the attributes of `parsed_data` along with their default values:
Here is all of the attributes of `data` from the `Torrent` object, along with their default values.

This is accessible at `torrent.data` in the `Torrent` object. Ex: `torrent.data.resolution`

```py
class ParsedData(BaseModel):
Expand Down
33 changes: 16 additions & 17 deletions RTN/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ class Torrent(BaseModel):
Attributes:
`raw_title` (str): The original title of the torrent.
`infohash` (str): The SHA-1 hash identifier of the torrent.
`parsed_data` (ParsedData): Metadata extracted from the torrent title including PTN parsing and additional extras.
`data` (ParsedData): Metadata extracted from the torrent title including PTN parsing and additional extras.
`fetch` (bool): Indicates whether the torrent meets the criteria for fetching based on user settings.
`rank` (int): The computed ranking score of the torrent based on user-defined preferences.
`lev_ratio` (float): The Levenshtein ratio comparing the parsed title and the raw title for similarity.
Expand All @@ -30,18 +30,16 @@ class Torrent(BaseModel):

raw_title: str
infohash: str
parsed_data: ParsedData
data: ParsedData
fetch: bool = False
rank: int = 0
lev_ratio: float = 0.0

@validator("raw_title", "infohash")
def validate_strings(cls, v):
"""Ensures raw_title and infohash are strings."""
if not v:
raise ValueError("Value cannot be empty.")
if not isinstance(v, str):
raise ValueError("Value must be a string.")
if not v or not isinstance(v, str):
raise TypeError("The title and infohash must be non-empty strings.")
return v

@validator("infohash")
Expand Down Expand Up @@ -92,12 +90,10 @@ def rank(self, raw_title: str, infohash: str) -> Torrent:
raise ValueError("The infohash must be a valid SHA-1 hash and 40 characters in length.")

parsed_data = parse(raw_title)
if not parsed_data:
raise ValueError(f"Failed to parse the title: {raw_title}")
return Torrent(
raw_title=raw_title,
infohash=infohash,
parsed_data=parsed_data,
data=parsed_data,
fetch=check_fetch(parsed_data, self.settings),
rank=get_rank(parsed_data, self.settings, self.ranking_model),
lev_ratio=Levenshtein.ratio(parsed_data.parsed_title.lower(), raw_title.lower()),
Expand All @@ -117,11 +113,11 @@ def parse(raw_title: str) -> ParsedData:
if not raw_title or not isinstance(raw_title, str):
raise TypeError("The input title must be a non-empty string.")

parsed_dict: dict[str, Any] = PTN.parse(raw_title, coherent_types=True) # Imagine this returns a dict
extras: dict[str, Any] = parse_extras(raw_title) # Returns additional fields as a dict
full_data = {**parsed_dict, **extras} # Merge PTN parsed data with extras
full_data["raw_title"] = raw_title # Add the raw title to the data
full_data["parsed_title"] = parsed_dict.get("title") # Add the parsed title to the data
parsed_dict: dict[str, Any] = PTN.parse(raw_title, coherent_types=True)
extras: dict[str, Any] = parse_extras(raw_title)
full_data = {**parsed_dict, **extras} # Merge PTN parsed data with RTN extras.
full_data["raw_title"] = raw_title
full_data["parsed_title"] = parsed_dict.get("title")
return ParsedData(**full_data)


Expand All @@ -130,20 +126,21 @@ def parse_chunk(chunk: List[str]) -> List[ParsedData]:
return [parse(title) for title in chunk]


def batch_parse(titles: List[str], chunk_size: int = 50) -> List[ParsedData]:
def batch_parse(titles: List[str], chunk_size: int = 50, max_workers: int = 4) -> List[ParsedData]:
"""
Parses a list of torrent titles in batches for improved performance.
Args:
titles (List[str]): A list of torrent titles to parse.
chunk_size (int): The number of titles to process in each batch.
max_workers (int): The maximum number of worker threads to use for parsing.
Returns:
List[ParsedData]: A list of ParsedData objects for each title.
"""
chunks = [titles[i : i + chunk_size] for i in range(0, len(titles), chunk_size)]
parsed_data = []
with ThreadPoolExecutor() as executor:
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_chunk = {executor.submit(parse_chunk, chunk): chunk for chunk in chunks}
for future in as_completed(future_to_chunk):
chunk_result = future.result()
Expand All @@ -163,10 +160,12 @@ def title_match(correct_title: str, raw_title: str, threshold: float = 0.9) -> b
Returns:
bool: True if the titles are similar above the specified threshold; False otherwise.
"""
if not correct_title or not raw_title:
if not (correct_title or raw_title):
raise ValueError("Both titles must be provided.")
if not isinstance(correct_title, str) or not isinstance(raw_title, str):
raise TypeError("Both titles must be strings.")
if not isinstance(threshold, (int, float)) or not 0 <= threshold <= 1:
raise ValueError("The threshold must be a float between 0 and 1.")
return Levenshtein.ratio(correct_title.lower(), raw_title.lower()) >= threshold


Expand Down
2 changes: 1 addition & 1 deletion RTN/patterns.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def compile_patterns(patterns):
return [regex.compile(pattern, regex.IGNORECASE) for pattern in patterns]


# Pattern for identifying unwanted quality. This will set `parsed_data.fetch`.
# Pattern for identifying unwanted quality. This will set `data.fetch`.
IS_TRASH_COMPILED = compile_patterns(
[
r"\b(?:H[DQ][ .-]*)?CAM(?:H[DQ])?(?:[ .-]*Rip)?\b",
Expand Down
82 changes: 41 additions & 41 deletions RTN/ranker.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
from .models import BaseRankingModel, ParsedData, SettingsModel


def get_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
def get_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
"""
Calculate the ranking of the given parsed data.
Parameters:
parsed_data (ParsedData): The parsed data object containing information about the torrent title.
data (ParsedData): The parsed data object containing information about the torrent title.
settings (SettingsModel): The user settings object containing custom ranking models.
rank_model (BaseRankingModel): The base ranking model used for calculating the ranking.
Expand All @@ -19,47 +19,47 @@ def get_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseR
ValueError: If the parsed data is empty.
TypeError: If the parsed data is not a ParsedData object.
"""
if not parsed_data:
if not data:
raise ValueError("Parsed data cannot be empty.")
if not isinstance(parsed_data, ParsedData):
if not isinstance(data, ParsedData):
raise TypeError("Parsed data must be an instance of ParsedData.")

rank: int = calculate_resolution_rank(parsed_data, settings, rank_model)
rank += calculate_quality_rank(parsed_data, settings, rank_model)
rank += calculate_codec_rank(parsed_data, settings, rank_model)
rank += calculate_audio_rank(parsed_data, settings, rank_model)
rank += calculate_other_ranks(parsed_data, settings, rank_model)
rank += calculate_preferred(parsed_data, settings)
if parsed_data.repack:
rank: int = calculate_resolution_rank(data, settings, rank_model)
rank += calculate_quality_rank(data, settings, rank_model)
rank += calculate_codec_rank(data, settings, rank_model)
rank += calculate_audio_rank(data, settings, rank_model)
rank += calculate_other_ranks(data, settings, rank_model)
rank += calculate_preferred(data, settings)
if data.repack:
rank += rank_model.repack if not settings.custom_ranks["repack"].enable else settings.custom_ranks["repack"].rank
if parsed_data.proper:
if data.proper:
rank += rank_model.proper if not settings.custom_ranks["proper"].enable else settings.custom_ranks["proper"].rank
if parsed_data.remux:
if data.remux:
rank += rank_model.remux if not settings.custom_ranks["remux"].enable else settings.custom_ranks["remux"].rank
if parsed_data.is_multi_audio:
if data.is_multi_audio:
rank += rank_model.dubbed if not settings.custom_ranks["dubbed"].enable else settings.custom_ranks["dubbed"].rank
if parsed_data.is_multi_subtitle:
if data.is_multi_subtitle:
rank += rank_model.subbed if not settings.custom_ranks["subbed"].enable else settings.custom_ranks["subbed"].rank
return rank


def calculate_preferred(parsed_data: ParsedData, settings: SettingsModel) -> int:
def calculate_preferred(data: ParsedData, settings: SettingsModel) -> int:
"""Calculate the preferred ranking of a given parsed data."""
if not settings.preferred or all(pattern is None for pattern in settings.preferred):
return 0
return (
5000
if any(pattern.search(parsed_data.raw_title) for pattern in settings.preferred if pattern) # type: ignore
if any(pattern.search(data.raw_title) for pattern in settings.preferred if pattern) # type: ignore
else 0
)


def calculate_resolution_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
def calculate_resolution_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
"""Calculate the resolution ranking of the given parsed data."""
if not parsed_data.resolution:
if not data.resolution:
return 0

resolution: str = parsed_data.resolution[0]
resolution: str = data.resolution[0]
match resolution:
case "4K":
return rank_model.uhd if not settings.custom_ranks["uhd"].enable else settings.custom_ranks["uhd"].rank
Expand All @@ -77,12 +77,12 @@ def calculate_resolution_rank(parsed_data: ParsedData, settings: SettingsModel,
return 0


def calculate_quality_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
def calculate_quality_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
"""Calculate the quality ranking of the given parsed data."""
if not parsed_data.quality:
if not data.quality:
return 0

quality = parsed_data.quality[0]
quality = data.quality[0]
match quality:
case "WEB-DL":
return rank_model.webdl if not settings.custom_ranks["webdl"].enable else settings.custom_ranks["webdl"].rank
Expand All @@ -100,12 +100,12 @@ def calculate_quality_rank(parsed_data: ParsedData, settings: SettingsModel, ran
return 0


def calculate_codec_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
def calculate_codec_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
"""Calculate the codec ranking of the given parsed data."""
if not parsed_data.codec:
if not data.codec:
return 0

codec = parsed_data.codec[0]
codec = data.codec[0]
match codec:
case "Xvid" | "H.263" | "VC-1" | "MPEG-2":
return -1000
Expand All @@ -119,12 +119,12 @@ def calculate_codec_rank(parsed_data: ParsedData, settings: SettingsModel, rank_
return 0


def calculate_audio_rank(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
def calculate_audio_rank(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
"""Calculate the audio ranking of the given parsed data."""
if not parsed_data.audio:
if not data.audio:
return 0

audio_format: str = parsed_data.audio[0]
audio_format: str = data.audio[0]

# Remove any unwanted audio formats. We dont support surround sound formats yet.
# These also make it harder to compare audio formats.
Expand Down Expand Up @@ -188,31 +188,31 @@ def calculate_audio_rank(parsed_data: ParsedData, settings: SettingsModel, rank_
return 0


def calculate_other_ranks(parsed_data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
def calculate_other_ranks(data: ParsedData, settings: SettingsModel, rank_model: BaseRankingModel) -> int:
"""Calculate all the other rankings of the given parsed data."""
if not ["bitDepth"] and not parsed_data.hdr and not parsed_data.is_complete:
if not ["bitDepth"] and not data.hdr and not data.is_complete:
return 0

total_rank = 0
if parsed_data.bitDepth and parsed_data.bitDepth[0] > 8:
if data.bitDepth and data.bitDepth[0] > 8:
total_rank += 2
if parsed_data.hdr:
if parsed_data.hdr == "HDR":
if data.hdr:
if data.hdr == "HDR":
total_rank += settings.custom_ranks["hdr"].rank if settings.custom_ranks["hdr"].enable else rank_model.hdr
elif parsed_data.hdr == "HDR10+":
elif data.hdr == "HDR10+":
total_rank += (
settings.custom_ranks["hdr10"].rank if settings.custom_ranks["hdr10"].enable else rank_model.hdr10
)
elif parsed_data.hdr == "DV":
elif data.hdr == "DV":
total_rank += (
settings.custom_ranks["dolby_video"].rank
if settings.custom_ranks["dolby_video"].enable
else rank_model.dolby_video
)
if parsed_data.is_complete:
if data.is_complete:
total_rank += 100
if parsed_data.season:
total_rank += 100 * len(parsed_data.season)
if parsed_data.episode:
total_rank += 10 * len(parsed_data.episode)
if data.season:
total_rank += 100 * len(data.season)
if data.episode:
total_rank += 10 * len(data.episode)
return total_rank
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "rank-torrent-name"
version = "0.1.3"
version = "0.1.4"
description = "Parse Torrents using PTN and Rank them according to your preferences!"
authors = ["Spoked <dreu.lavelle@gmail.com>"]
license = "MIT"
Expand Down
Loading

0 comments on commit b3a92fe

Please sign in to comment.