Lost in Labels: An Ongoing Quest to Optimize Text-to-Text Label Selection for Classification

Michele Papucci, Alessio Miaschi, Felice Dell'Orletta

October 2023

PDF

Abstract

In this paper, we present an evaluation of the influence of label selection on the performance of a Sequence-to-Sequence Transformer model in a classification task. Our study investigates whether the choice of words used to represent classification categories affects the model’s performance, and if there exists a relationship between the model’s performance and the selected words. To achieve this, we fine-tuned an Italian T5 model on topic classification using various labels. Our results indicate that the different label choices can significantly impact the model’s performance. That being said, we did not find a clear answer on how these choices affect the model performances, highlighting the need for further research in optimizing label selection.

Type

Conference paper

Publication

In Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)

Source Themes

Lost in Labels: An Ongoing Quest to Optimize Text-to-Text Label Selection for Classification

Abstract

Alessio Miaschi

Full-time researcher (RTD) in Natural Language Processing