Lost in Labels: An Ongoing Quest to Optimize Text-to-Text Label Selection for Classification

Abstract

In this paper, we present an evaluation of the influence of label selection on the performance of a Sequence-to-Sequence Transformer model in a classification task. Our study investigates whether the choice of words used to represent classification categories affects the model’s performance, and if there exists a relationship between the model’s performance and the selected words. To achieve this, we fine-tuned an Italian T5 model on topic classification using various labels. Our results indicate that the different label choices can significantly impact the model’s performance. That being said, we did not find a clear answer on how these choices affect the model performances, highlighting the need for further research in optimizing label selection.

Publication
In Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)
Alessio Miaschi
Alessio Miaschi
PostDoc in Natural Language Processing