Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Evaluate/Revise Submission Ranking Algorithms #148

Open
nautbot opened this issue Mar 31, 2018 · 0 comments
Open

Feature: Evaluate/Revise Submission Ranking Algorithms #148

nautbot opened this issue Mar 31, 2018 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@nautbot
Copy link
Contributor

nautbot commented Mar 31, 2018

Evaluate existing submission ranking algorithms, determine new or otherwise available replacements.


"Hot" Submissions

Review Submission entity's updateRanking() function: https://github.com/TheRealGD/therealgd/blob/develop/src/Entity/Submission.php

The XKCD "Hot" algorithm may be suitable replacement or starting point given lower voting activity (typically <1500 votes per submission):

from datetime import datetime, timedelta
from math import log

epoch = datetime(1970, 1, 1)

def epoch_seconds(date):
    td = date - epoch
    return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)

def hot(ups, downs, date):
    s = ups - downs
    order = log(max(abs(s), 1), 10)
    sign = 1 if s > 0 else -1 if s < 0 else 0
    seconds = epoch_seconds(date) - 1134028003
    return round(sign * order + (seconds / 45000), 7)

Based on a quick review of this formula, it appears to lack a disadvantage for "controversial" submissions with a more balanced ratio of upvotes vs. downvotes:

hot(200, 5, datetime(2018, 1, 1)) = 8463.1077457
hot(1000, 500, datetime(2018, 1, 1)) = 8463.5166811
hot(1000, 5, datetime(2018, 1, 1)) = 8463.8155342

In this case with three submission made at the same time, the second submission with a 2:1 vote ratio would outrank the first with a 40:1 vote ratio.

Perhaps something like this, with more weight towards "quality" than just raw net score would be better (I don't know jack about PHP so I'll just stick to Python examples):

from datetime import datetime, timedelta
from math import log, sqrt

epoch = datetime(1970, 1, 1)

def epoch_seconds(date):
    td = date - epoch
    return td.days * 86400 + td.seconds + (float(td.microseconds) / 1000000)

def hot(ups, downs, date):
    score = ups - downs
    votes = ups + downs
    ratio = score / max(votes, 1)
    sign = (score > 0) - (score < 0)
    order = log(max(abs(score * ratio), 1), 10)
    seconds = epoch_seconds(date) - 1134028003
    return round(sign * ratio * order + (seconds / 45000), 7)

The result is an advantage for quality of submission, as reflected by the first case (40:1) ranking above the second (2:1):

hot(200, 5, datetime(2018, 1, 1)) = 8462.9960367
hot(1000, 500, datetime(2018, 1, 1)) = 8461.7173678
hot(1000, 5, datetime(2018, 1, 1)) = 8463.7857051

It would also still give weight to sheer vote volume, such as two 40:1 submissions in the example below:

hot(40, 1, datetime(2018, 1, 1)) = 8462.3311628
hot(200, 5, datetime(2018, 1, 1)) = 8462.9960367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants