Skip to content

Flexible crowdsourced data labeling solutions for scarce and incomplete annotations

Notifications You must be signed in to change notification settings

taharallouche/hakeem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧙‍♂️ hakeem (حَكِيمْ) 🧙‍♂️

Apply state-of-the-art data labelling methods to your own datasets.🛠️🗃️

The vote-size-matters collective labelling method

If you possess an unlabeled dataset comprising 📷 images, 🔊 sounds, 🎥 videos, or ✉️ texts, and you have collected some crowdsourced annotations with the aim of aggregating them optimally to deduce the correct label for each instance, then hakeem is the solution you're seeking! 🚀

The package implements the size-matters truth tracking principle, 💡 which has consistently shown superior performance compared to other voter-agnostic aggregation rules 📈. One notable advantage of this method is its reliance on a simple intuition, making the results it produces entirely explainable! 🎯🌟

In fact, the method's key principles include:

  1. Granting hesitant voters the flexibility to select more than one possible label. 🤔🔄
  2. Relying on mathematically proven payment schemes to ensure sincerity of voters.📊✅
  3. Assigning greater weight to voters who choose fewer labels. After all, a voter familiar with the correct label would likely choose that option, whereas a voter who selects too many labels probably doesn't know the correct answer.⚖️

Various weighting schemes are provided to the user, with each one being optimal under different assumptions. The choice of the right scheme is yours to make!

Installation

You can install the hakeem package directly from PyPi using pip:

pip install hakeem

Note: paper results reproduction

The code for reproducing the original AAAI-2022 paper's experiments 📚🧪📊, benchmarking the vote-size-matters crowdsourcing data labelling method, has been moved to a dedicated repo.

About

Flexible crowdsourced data labeling solutions for scarce and incomplete annotations

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published