These are some simple model tests for coded EPINetz Policy Field data (https://epinetz.de/), classifying German politicians' tweets into one or more of 16 policy fields (and a dummy 'none' category). Performance is measured for OpenAI's GPT 4o and several RoBERTa models fine-tuned on data coded for the Comparative Agendas Project (CAP). The Comparative Agendas Project is classifying texts into policy areas via manual coding and has coding teams for several countries/languages, including German. The RoBERTa models fine-tuned on this data are provided by the CAP Babel Machine Project (https://capbabel.poltextlab.com/). The GPT model was tested with zero-shot calls providing coding instructions of the specific EPINetz categories. The EPINetz categorization scheme is based on sub-categories of the CAP.
Results show that neither of the methods provides sufficiently reliable outcomes. The GPT model struggles with some categories more than with others: It performs better on categories more easily understood in a common-sense kind of way (such as defense policy or health) and underperforms significantly for more intricate categories (such as social welfare or domestic politics). The CAP-trained classifiers perform even worse in most categories, and remain behind the author's reported model accuracies (which range between 0.64 and 0.72, depending on the model). This failure, however, can in large part be explained by the fact that the models were trained on the major CAP categories, while the EPINetz categories are composites of the CAP's minor categories. This leads to categorical overlap and inaccuracies. However, the CAP-trained RoBERTa models may provide a decent starting point for fine-tuning a model for the EPINetz categorisation scheme of policy areas.