GitHub - TimBMK/EPINetz_Transformer_Classification: Evaluate Transformer-based Policy Classification on some labeled EPINetz Data

These are some simple model tests for coded EPINetz Policy Field data (https://epinetz.de/), classifying German politicians' tweets into one or more of 16 policy fields (and a dummy 'none' category). Performance is measured for OpenAI's GPT 4o and several RoBERTa models fine-tuned on data coded for the Comparative Agendas Project (CAP). The Comparative Agendas Project is classifying texts into policy areas via manual coding and has coding teams for several countries/languages, including German. The RoBERTa models fine-tuned on this data are provided by the CAP Babel Machine Project (https://capbabel.poltextlab.com/). The GPT model was tested with zero-shot calls providing coding instructions of the specific EPINetz categories. The EPINetz categorization scheme is based on sub-categories of the CAP.

Results show that neither of the methods provides sufficiently reliable outcomes. The GPT model struggles with some categories more than with others: It performs better on categories more easily understood in a common-sense kind of way (such as defense policy or health) and underperforms significantly for more intricate categories (such as social welfare or domestic politics). The CAP-trained classifiers perform even worse in most categories, and remain behind the author's reported model accuracies (which range between 0.64 and 0.72, depending on the model). This failure, however, can in large part be explained by the fact that the models were trained on the major CAP categories, while the EPINetz categories are composites of the CAP's minor categories. This leads to categorical overlap and inaccuracies. However, the CAP-trained RoBERTa models may provide a decent starting point for fine-tuning a model for the EPINetz categorisation scheme of policy areas.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
CAP_sample_coded.tsv		CAP_sample_coded.tsv
README.md		README.md
roberta_cap_classifier.ipynb		roberta_cap_classifier.ipynb
vectorstore_retriever.ipynb		vectorstore_retriever.ipynb
zeroshot_openAI.ipynb		zeroshot_openAI.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

TimBMK/EPINetz_Transformer_Classification

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages