Skip to content

Commit

Permalink
feat: add more tests. more tweaks. add batch parsing.
Browse files Browse the repository at this point in the history
  • Loading branch information
dreulavelle committed Mar 27, 2024
1 parent 9fcf5f8 commit 23d8b2d
Show file tree
Hide file tree
Showing 15 changed files with 429 additions and 249 deletions.
5 changes: 4 additions & 1 deletion .github/ISSUE_TEMPLATE/---bug-report.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
name: "\U0001F41E Bug Report"
labels: ["kind/bug", "status/triage"]
description: "Rank Torrent Name (RTN) not working the way it is documented?"
title: "[Bug]: "
labels: ["kind/bug", "status/triage"]
assignees:
- dreulavelle

body:
- type: markdown
Expand Down
5 changes: 4 additions & 1 deletion .github/ISSUE_TEMPLATE/---feature-request.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
name: "\U0001F381 Feature Request"
labels: ["kind/feature", "status/triage"]
description: "Did you find bugs, errors, or anything that isn't straightforward in the documentation?"
title: "[Feature]: "
labels: ["kind/feature", "status/triage"]
assignees:
- dreulavelle

body:
- type: markdown
Expand Down
5 changes: 4 additions & 1 deletion .github/ISSUE_TEMPLATE/---maintainers.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
name: "\U0001F41E Request to Help in Development of RTN"
labels: ["kind/maintainer", "status/triage"]
description: "Want to help in the development of Rank Torrent Name (RTN)?"
title: "[Help]: "
labels: ["kind/maintainer", "status/triage"]
assignees:
- dreulavelle

body:
- type: markdown
Expand Down
9 changes: 9 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Pull Request Check List

Resolves: #issue-number-here

- [ ] Added **tests** for changed code.
- [ ] Updated **documentation** for changed code.

## Description:

8 changes: 0 additions & 8 deletions .github/workflows/PULL_REQUEST_TEMPLATE.md

This file was deleted.

129 changes: 95 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,35 +131,6 @@ Torrent(
lev_ratio=0.95
)
```
## Torrent Parser

You can also parse a torrent title similar to how PTN works. This is an enhanced version of PTN that combines RTN's parsing as well. This also includes enhanced episode parsing as well that covers a much better range of titles.

Using the example above:

```py
from RTN import parse
parsed = parse("Example.Movie.2020.1080p.BluRay.x264-Example")

print(parsed.parsed_title) # Output: "Example Movie"
print(parsed.year) # Output: [2020]
```

We also set **coherent_types** to `True` from the PTN data that get's combined with RTN parsed metadata.

## Checking Title Similarity

Sometimes, you might just want to check if two titles match closely enough, without going through the entire ranking process. RTN provides a simple function, title_match, for this purpose:

```py
from RTN import title_match

# Check if two titles are similar above a threshold of 0.9
match = title_match("Correct Movie Title 2020", "Correct Movie Title (2020)")
print(match) # Output: True if similarity is above 0.9, otherwise False
```

This functionality is especially useful when you have a list of potential titles and want to find the best match for a given reference title.

## Understanding SettingsModel and RankingModel

Expand Down Expand Up @@ -286,26 +257,76 @@ Keep in mind that these are explicitly set within RTN and are needed in order fo

Create as many `SettingsModel` and `RankingModel` as you like to use anywhere in your code. They are mean't to be used as a way to version settings for your users.

## Real World Example
# Extras

## Torrent Parser

You can also parse a torrent title similar to how PTN works. This is an enhanced version of PTN that combines RTN's parsing as well. This also includes enhanced episode parsing as well that covers a much better range of titles.

Using the example above:

```py
from RTN import parse
parsed = parse("Example.Movie.2020.1080p.BluRay.x264-Example")

print(parsed.parsed_data.raw_title) # Output: "Example.Movie.2020.1080p.BluRay.x264-Example"
print(parsed.parsed_data.parsed_title) # Output: "Example Movie"
print(parsed.parsed_data.year) # Output: [2020]
```

> :warning: We also set **coherent_types** to `True` from the PTN data that get's combined with RTN parsed metadata.
> This just ensures that all the types are uniform. **Everything is either a list of string or int's, or it's a boolean.**
## Checking Title Similarity

Sometimes, you might just want to check if two titles match closely enough, without going through the entire ranking process. RTN provides a simple function, title_match, for this purpose:

```py
from RTN import title_match

# Check if two titles are similar above a threshold of 0.9
match = title_match("Correct Movie Title 2020", "Correct Movie Title (2020)")
print(match) # Output: True if similarity is above 0.9, otherwise False
```

This functionality is especially useful when you have a list of potential titles and want to find the best match for a given reference title.

## Trash Check

Maybe you just want to use our own garbage collector to weed out bad titles in your current scraping setup?

```py
from RTN import check_trash

if check_trash(raw_title):
# You can safely remove any title or item from being scraped if this returns True!
...
```

# Real World Example

Here is a crude example of how you could use RTN in scraping.

```py
from RTN import RTN, Torrent, DefaultRanking

# Assuming 'settings' is defined somewhere and passed correctly
# Assuming 'settings' is defined somewhere and passed correctly.
rtn = RTN(settings=settings, ranking_model=DefaultRanking())
...
# Define some function for scraping for results..
# Define some function for scraping for results from some API.
if response.ok:
torrents = set()
for stream in response.streams:
if not stream.infohash or not title_match(correct_title, stream.title):
# Skip results that don't match the query
# Skip results that don't match the query.
# We want to do this first to weed out torrents
# that are below the 90% match criteria. (Default is 90%)
continue
torrent: Torrent = rtn.rank(stream.title, stream.infohash)
if torrent and torrent.fetch:
# Skip trash torrents by checking torrent.fetch
# Skip trash torrents by checking `torrent.fetch`.
# If torrent.fetch is True, then it's a good torrent,
# as considered by your ranking profile and settings model.
torrents.add(torrent)

# Sort the list of torrents based on their rank in descending order
Expand All @@ -318,6 +339,46 @@ for torrent in sorted_torrents:
print(f"Title: {torrent.parsed_data.parsed_title}, Infohash: {torrent.infohash}, Rank: {torrent.rank}")
```

# ParsedData Structure

Here is all of the attributes of `parsed_data` along with their default values:

```py
class ParsedData(BaseModel):
"""Parsed data model for a torrent title."""

raw_title: str
parsed_title: str
fetch: bool = False
is_4k: bool = False
is_multi_audio: bool = False
is_multi_subtitle: bool = False
is_complete: bool = False
year: List[int] = []
resolution: List[str] = []
quality: List[str] = []
season: List[int] = []
episode: List[int] = []
codec: List[str] = []
audio: List[str] = []
subtitles: List[str] = []
language: List[str] = []
bitDepth: List[int] = []
hdr: str | bool = False
proper: bool = False
repack: bool = False
remux: bool = False
upscaled: bool = False
remastered: bool = False
directorsCut: bool = False
extended: bool = False
excess: list = []
```

This will continue to grow though as we expand on functionality, so keep checking back for this list!

> :warning: Don't see something you want in the list? Submit a [Feature Request](https://github.com/dreulavelle/rank-torrent-name/issues/new?assignees=dreulavelle&labels=kind%2Ffeature%2Cstatus%2Ftriage&projects=&template=---feature-request.yml) to have it added!
## Contributing

Contributions to RTN are welcomed! Feel free to submit pull requests or open issues to suggest features or report bugs. As we grow, more features will be coming to RTN, there's already a lot planned!
Expand Down
3 changes: 2 additions & 1 deletion RTN/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
from .fetch import check_fetch, check_trash
from .models import BaseRankingModel, DefaultRanking, ParsedData, SettingsModel
from .parser import RTN, Torrent, parse, sort, title_match
from .parser import RTN, Torrent, batch_parse, parse, sort, title_match
from .patterns import parse_extras
from .ranker import get_rank

__all__ = [
"RTN",
"Torrent",
"parse",
"batch_parse",
"get_rank",
"check_fetch",
"check_trash",
Expand Down
8 changes: 5 additions & 3 deletions RTN/fetch.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
import regex

from .models import ParsedData, SettingsModel
from .patterns import TRASH_COMPILED
from .patterns import IS_TRASH_COMPILED


def check_trash(raw_title: str) -> bool:
"""Check if the title contains unwanted patterns."""
if not raw_title or not isinstance(raw_title, str):
raise TypeError("The input title must be a non-empty string.")
return not any(pattern.search(raw_title) for pattern in TRASH_COMPILED)
# True if we find any of the trash patterns in the title.
# You can safely remove any title from being scraped if this returns True!
return any(pattern.search(raw_title) for pattern in IS_TRASH_COMPILED)


def check_fetch(data: ParsedData, settings: SettingsModel) -> bool:
"""Check user settings and unwanted quality to determine if torrent should be fetched."""
if not check_trash(data.raw_title):
if check_trash(data.raw_title):
return False
if settings.require and any(
pattern.search(data.raw_title) for pattern in settings.require if pattern # type: ignore
Expand Down
81 changes: 36 additions & 45 deletions RTN/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,36 +42,41 @@ class BaseRankingModel(BaseModel):
The ranking values are used to determine the quality of a media item based on its attributes.
Attributes:
uhd (int): The ranking value for Ultra HD (4K) resolution.
fhd (int): The ranking value for Full HD (1080p) resolution.
hd (int): The ranking value for HD (720p) resolution.
sd (int): The ranking value for SD (480p) resolution.
bluray (int): The ranking value for Blu-ray quality.
hdr (int): The ranking value for HDR quality.
hdr10 (int): The ranking value for HDR10 quality.
dolby_video (int): The ranking value for Dolby video quality.
dts_x (int): The ranking value for DTS:X audio quality.
dts_hd (int): The ranking value for DTS-HD audio quality.
dts_hd_ma (int): The ranking value for DTS-HD Master Audio audio quality.
atmos (int): The ranking value for Dolby Atmos audio quality.
truehd (int): The ranking value for Dolby TrueHD audio quality.
ddplus (int): The ranking value for Dolby Digital Plus audio quality.
ac3 (int): The ranking value for AC3 audio quality.
aac (int): The ranking value for AAC audio quality.
remux (int): The ranking value for remux attribute.
webdl (int): The ranking value for web-dl attribute.
repack (int): The ranking value for repack attribute.
proper (int): The ranking value for proper attribute.
dubbed (int): The ranking value for dubbed attribute.
subbed (int): The ranking value for subbed attribute.
av1 (int): The ranking value for AV1 attribute.
`uhd` (int): The ranking value for Ultra HD (4K) resolution.
`fhd` (int): The ranking value for Full HD (1080p) resolution.
`hd` (int): The ranking value for HD (720p) resolution.
`sd` (int): The ranking value for SD (480p) resolution.
`bluray` (int): The ranking value for Blu-ray quality.
`hdr` (int): The ranking value for HDR quality.
`hdr10` (int): The ranking value for HDR10 quality.
`dolby_video` (int): The ranking value for Dolby video quality.
`dts_x` (int): The ranking value for DTS:X audio quality.
`dts_hd` (int): The ranking value for DTS-HD audio quality.
`dts_hd_ma` (int): The ranking value for DTS-HD Master Audio audio quality.
`atmos` (int): The ranking value for Dolby Atmos audio quality.
`truehd` (int): The ranking value for Dolby TrueHD audio quality.
`ddplus` (int): The ranking value for Dolby Digital Plus audio quality.
`ac3` (int): The ranking value for AC3 audio quality.
`aac` (int): The ranking value for AAC audio quality.
`remux` (int): The ranking value for remux attribute.
`webdl` (int): The ranking value for web-dl attribute.
`repack` (int): The ranking value for repack attribute.
`proper` (int): The ranking value for proper attribute.
`dubbed` (int): The ranking value for dubbed attribute.
`subbed` (int): The ranking value for subbed attribute.
`av1` (int): The ranking value for AV1 attribute.
Note:
- The higher the ranking value, the better the quality of the media item.
- The default ranking values are set to 0, which means that the attribute does not affect the overall rank.
- Users can customize the ranking values based on their preferences and requirements by using inheritance.
"""

# resolution
uhd: int = 0
fhd: int = 0
hd: int = 0
sd: int = 0
uhd: int = 0 # 4K
fhd: int = 0 # 1080p
hd: int = 0 # 720p
sd: int = 0 # 480p
# quality
bluray: int = 0
hdr: int = 0
Expand All @@ -98,7 +103,7 @@ class BaseRankingModel(BaseModel):


class DefaultRanking(BaseRankingModel):
"""Default ranking model for users to use."""
"""Default ranking model preset that should cover most common use cases."""

uhd: int = 140
fhd: int = 100
Expand Down Expand Up @@ -209,12 +214,11 @@ def compile_and_validate_patterns(cls, values: dict[str, Any]) -> dict[str, Any]
compiled_patterns = []
for pattern in raw_patterns:
if isinstance(pattern, str):
# Compile the pattern, taking into account your custom syntax for options like case-sensitivity
if pattern.startswith("/") and pattern.endswith("/i"):
if pattern.startswith("/") and pattern.endswith("/i"): # case-insensitive
compiled_patterns.append(regex.compile(pattern[1:-2], regex.IGNORECASE))
elif pattern.startswith("/") and pattern.endswith("/"):
elif pattern.startswith("/") and pattern.endswith("/"): # case-sensitive
compiled_patterns.append(regex.compile(pattern[1:-1]))
else:
else: # case-insensitive by default
compiled_patterns.append(regex.compile(pattern, regex.IGNORECASE))
elif isinstance(pattern, regex.Pattern):
# Keep already compiled patterns as is
Expand All @@ -226,16 +230,3 @@ def compile_and_validate_patterns(cls, values: dict[str, Any]) -> dict[str, Any]

class Config:
arbitrary_types_allowed = True

def __getitem__(self, key: str) -> CustomRank:
"""Allows direct access to custom rank settings."""
return self.custom_ranks.get(key, CustomRank())

def __setitem__(self, key: str, value: CustomRank):
"""Enables setting custom rank settings."""
self.custom_ranks[key] = value

def __delitem__(self, key: str):
"""Allows deletion of custom rank settings."""
if key in self.custom_ranks:
del self.custom_ranks[key]
Loading

0 comments on commit 23d8b2d

Please sign in to comment.