Evaluating Transformer Models for Punctuation Restoration in Italian

Abstract

In this paper, we propose an evaluation of a Transformer-based punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and cross-domain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.

Publication
In Proceedings of 5th Workshop on Natural Language for Artificial Intelligence (NL4AI @ AIxIA 2021)
Alessio Miaschi
Alessio Miaschi
Full-time researcher (RTDA) in Natural Language Processing