Apply state-of-the-art data labelling methods to your own datasets.🛠️🗃️
If you possess an unlabeled dataset comprising 📷 images, 🔊 sounds, 🎥 videos, or ✉️ texts, and you have collected some crowdsourced annotations with the aim of aggregating them optimally to deduce the correct label for each instance, then hakeem
is the solution you're seeking! 🚀
The package implements the size-matters truth tracking principle, 💡 which has consistently shown superior performance compared to other voter-agnostic aggregation rules 📈. One notable advantage of this method is its reliance on a simple intuition, making the results it produces entirely explainable! 🎯🌟
In fact, the method's key principles include:
- Granting hesitant voters the flexibility to select more than one possible label. 🤔🔄
- Relying on mathematically proven payment schemes to ensure sincerity of voters.📊✅
- Assigning greater weight to voters who choose fewer labels. After all, a voter familiar with the correct label would likely choose that option, whereas a voter who selects too many labels probably doesn't know the correct answer.⚖️
Various weighting schemes are provided to the user, with each one being optimal under different assumptions. The choice of the right scheme is yours to make!
You can install the hakeem
package directly from PyPi
using pip
:
pip install hakeem
The code for reproducing the original AAAI-2022 paper's experiments 📚🧪📊, benchmarking the vote-size-matters crowdsourcing data labelling method, has been moved to a dedicated repo.