Two papers accepted at CLiC-it 2020! In ‘Italian Transformers Under the Linguistic Lens’ (with Gabriele Sarti, Dominique Brunato, Felice Dell’Orletta and Giulia Venturi) we present an in-depth investigation of the lingusitic knowledge encoded by the Transformer models currently available for the Italian language. In particular, we showed that Multilayer Perceptron is the best model for inferring the amount of information implicitly encoded in the Transformers representations. We also observed that BERT-base-italian achieved best scores in average, but the linguistic generalization abilities of the examined models vary according to specific groups of linguistic phenomena and according to distinct textual genres.
In ‘Is Neural Language Model Perplexity Related to Readability?’ (with Chiara Alzetta, Dominique Brunato, Felice Dell’Orletta and Giulia Venturi) we explore the relationship between Neural Language Model (NLM) perplexity and (automatically assessed) sentence readability. Starting from the evidence that NLMs implicitly acquire sophisticated linguistic knowledge from a huge amount of training data, our goal is to investigate whether perplexity is affected by linguistic features used to automatically assess sentence readability and if there is a correlation between the two metrics. Our findings highlight that no significant correlation can be found, either between the two metrics and the set of linguistic features that mostly impact their values.