Our paper ‘Testing the Effectiveness of the Diagnostic Probing Paradigm on Italian Treebankss’ (with Chiara Alzetta, Dominique Brunato, Felice Dell’Orletta and Giulia Venturi) has been accepted for publication in the next issue of the Information journal. In this work we contribute to the debate on the effectiveness of the linguistic probing paradigm by presenting an approach to assessing the effectiveness of a suite of probing tasks aimed at testing the linguistic knowledge implicitly encoded by one of the most prominent NLMs, BERT. To this aim, we compared the performance of probes when predicting gold and automatically altered values of a set of linguistic features. Our experiments were performed on Italian and were evaluated across BERT’s layers and for sentences with different lengths. As a general result, we observed higher performance in the prediction of gold values, thus suggesting that the probing model is sensitive to the distortion of feature values. However, our experiments also showed that the length of a sentence is a highly influential factor that is able to confound the probing model’s predictions.