Download

Abstract

In this paper, we propose an evaluation of a Transformer-based punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and cross-domain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.


Citation
@inproceedings{miaschi2021punctuation_restoration,
  title={Evaluating Transformer Models for Punctuation Restoration in Italian},
  author={Miaschi, Alessio and Ravelli, Andrea Amelio and Dell'Orletta, Felice},
  booktitle={Proceedings of 5th Workshop on Natural Language for Artificial Intelligence (NL4AI 2021)},
  year={2021}
}