Adding a new site to the Agent :: The Code

The code is by far the most complicated of the three steps. I would suggest starting with the template file (siteTemplate.py) and then referencing other siteXXX.py or networkXXX.py files if need be.

The site code file has 2 major functions, one to search() for results on your new site, and one to update() metadata once a specific result is matched. The best case scenario is to use the original site's own search function to find results. If they don't have one, you will have to get creative and either set the search function up for direct URL matching, or if it's a small site, you might be able to just work your way through the pages listing all the scenes. For this example, let's assume the site you're adding has a search function.

A small example of a search function might look like this:

def search(results, encodedTitle, searchTitle, siteNum, lang, searchDate):
    req = PAutils.HTTPRequest(PAsearchSites.getSearchSearchURL(siteNum) + encodedTitle)
    searchResults = HTML.ElementFromString(req.text)
    for searchResult in searchResults.xpath('//div[@class="sceneWrapper"]'):
        titleNoFormatting = searchResult.xpath('.//div[@class="sceneTitle"]')[0].text_content().strip()
        releaseDate = parse(searchResult.xpath('.//div[@class="reldate"]')[0].text_content().strip()).strftime('%Y-%m-%d')
        curID = PAutils.Encode(earchResult.xpath('.//a/@href'))
        
        if searchDate:
            score = 100 - Util.LevenshteinDistance(searchDate, releaseDate)
        else:
            score = 100 - Util.LevenshteinDistance(searchTitle.lower(), titleNoFormatting.lower())

        results.Append(MetadataSearchResult(id='%s|%d' % (curID, siteNum), name='%s [%s] %s' % (titleNoFormatting, PAsearchSites.getSearchSiteName(siteNum), releaseDate), score=score, lang=lang))
    return results

Let's break this down so we understand each piece:

def search(results, encodedTitle, searchTitle, siteNum, lang, searchDate):

This will be basically the same on every site you add. These are the bits of information that have already been processed and are being passed to your search function. I have another page where I tried to explain the purpose of each of them, so I won't go into details but to say this line probably shouldn't be changed.

req = PAutils.HTTPRequest(PAsearchSites.getSearchSearchURL(siteNum) + encodedTitle)
searchResults = HTML.ElementFromString(req.text)

This grabs the raw HTML code from your site's search page and plops it in a big variable named searchResults. It would be the same as you going to the site in your web browser, typing something in the search and pressing ENTER, then on the results page, right-clicking the page and choosing "View page source". We're are now going to pick through this HTML code to find all the individual scenes that were returned by your search.

for searchResult in searchResults.xpath('//div[@class="sceneWrapper"]'):

This is a for() loop. It will run the code inside of it over and over and over for each time it finds a <div class="sceneWrapper"> in the raw HTML code. There are lots of guides out there about xpath format, so I won't detail it here. The idea here is that each search result will have some html tag surrounding the whole wad of information. Here is a simplified sample of a search results raw HTML:

<html>
    <head>
        <title>Search results for TEST</title>
        <script>"Here's a bunch of javascript for Google ad tracking or something"</script>
    </head>
    <body>
        <h1>Search results for TEST</h1>
        <div class="sceneWrapper">
           <a href="/videos/video1.html" class="sceneLink">
              <img src="/images/video1_preview.jpg" alt="Video #1!"><br />
              <div class="sceneTitle">       Video #1 starring Girl of Your Dreams!!       </div>
              <div class="reldate">Feb. 1st 2019</div>
           </a>
        </div>
        <div class="sceneWrapper">
           <a href="/videos/video2.html" class="sceneLink">
              <img src="/images/video2_preview.jpg" alt="Video #2!"><br />
              <div class="sceneTitle">       Video #2 starring Girl of Your Dreams!!       </div>
              <div class="reldate">Feb. 8th 2019</div>`
           </a>
        </div>
        <div class="sceneWrapper">
           <a href="/videos/video3.html" class="sceneLink">
              <img src="/images/video3_preview.jpg" alt="Video #3!"><br />
              <div class="sceneTitle">       Video #3 starring Girl of Your Dreams!!       </div>
              <div class="reldate">Feb. 15th 2019</div>
           </a>
        </div>
        <div class="sceneWrapper">
           <a href="/videos/video4.html" class="sceneLink">
              <img src="/images/video4_preview.jpg" alt="Video #4!"><br />
              <div class="sceneTitle">       Video #4 starring Girl of Your Dreams!!       </div>
              <div class="reldate">Feb. 22nd 2019</div>
           </a>
        </div>
    </body>
</html>

Whatever we typed into the search box got us 4 results, so the for() loop will run through 4 times since <div class="sceneWrapper"> is found 4 times in the search result HTML code.

titleNoFormatting = searchResult.xpath('.//div[@class="sceneTitle"]')[0].text_content().strip()

This is looking for the title of the scene. the period (.) at the beginning of the xpath means it will start whichever <div class="sceneWrapper"> this loop is for, and look for the next <div class="sceneTitle"> inside of that. This will ensure the title it returns matches the other information we're about to gather, and not get mixed up between all the scenes that were matched. .text_content() will return the words between the <div> and </div> tags, and .strip() will clear out the white space (spaces, tabs, line breaks) leaving just the title nice and pretty.

releaseDate = parse(searchResult.xpath('.//div[@class="reldate"]')[0].text_content().strip()).strftime('%Y-%m-%d')

Same deal for the release date. The parse() function can take a date in pretty much any format and convert it to a programming object. Finally the programming object is spat back out using .strftime('%Y-%m-%d') so that it formats the date in the same way every time, even if different sites list the date differently from each other.

curID = PAutils.Encode(searchResult.xpath('.//a/@href'))

This locates the URL of the specific scene page, then reformats it a little so it can be saved for later.

if searchDate:
    score = 100 - Util.LevenshteinDistance(searchDate, releaseDate)
else:
    score = 100 - Util.LevenshteinDistance(searchTitle.lower(), titleNoFormatting.lower())

This assigns the score to the result. If your filename had the release date, and the site's search results have a release date, it uses this to compare and find the best match. If either date is missing, it will compare the search text to the scene title. The more similar the titles (or dates), the higher the score.

results.Append(MetadataSearchResult(id='%s|%d' % (curID, siteNum), name='%s [%s] %s' % (titleNoFormatting, PAsearchSites.getSearchSiteName(siteNum), releaseDate), score=score, lang=lang))

Add this result to the pile to be displayed in Plex. You likely won't deviate from this format much. There are a few examples where I tucked a piece of information from the search results, or maybe a change to the name (like adding Network/SiteName), but generally speaking this will remain the same. After it adds this result to the list, it circles back to the top of the for() loop and does it again for the next instance of <div class="sceneWrapper">.

Once all the search results have been looped through, it returns the collected results back to the main function in Plex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a new site to the Agent :: The Code

Clone this wiki locally