Download
Abstract
In this paper, we propose an evaluation of a Transformer-based punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and cross-domain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.
Citation
@inproceedings{miaschi2021punctuation_restoration,
title={Evaluating Transformer Models for Punctuation Restoration in Italian},
author={Miaschi, Alessio and Ravelli, Andrea Amelio and Dell'Orletta, Felice},
booktitle={Proceedings of 5th Workshop on Natural Language for Artificial Intelligence (NL4AI 2021)},
year={2021}
}