Two papers accepted at CLiC-it 2023! In ‘Lost in Labels’ (with Michele Papucci and Felice Dell’Orletta) we present an evaluation of the influence of label selection on the performance of a Sequence-to-Sequence Transformer model in a classification task. Our study investigates whether the choice of words used to represent classification categories affects the model’s performance, and if there exists a relationship between the model’s performance and the selected words. To achieve this, we fine-tuned an Italian T5 model on topic classification using various labels. Our results indicate that the different label choices can significantly impact the model’s performance. That being said, we did not find a clear answer on how these choices affect the model performances, highlighting the need for further research in optimizing label selection.
In ‘Unmasking the Wordsmith: Revealing Author Identity through Reader Reviews’ (with Chiara Alzetta, Felice Dell’Orletta, Chiara Fazzone and Giulia Venturi) we propose a novel task called Book Author Prediction, where we predict the author of a book based on user-generated reviews' writing style. To this aim, we first introduce the Literary Voices Corpus (LVC), a dataset of Italian book reviews, and use it to train and test machine learning models. Our study contributes valuable insights for developing user-centric systems that recommend leisure readings based on individual readers' interests and writing styles.