News | Alessio Miaschi

Lesson at the Autumn School in AI, PhD in Digital Humanities (Università di Genova)

On November 11 I will give a lesson with my colleague Cristiano Ciaccio during the Autumn School in AI of the PhD Program in Digital Humanities (Università di Genova). The Aument School is open for doctoral and master students in the humanities. All the information about the school can be found here: https://digitalhumanities.phd.unige.it/node/1230 Title: A Theoretical and Practical Introduction to Neural Language Models: Evaluating and Exploring their Linguistic Abilities Abstract: The field of Natural Language Processing (NLP) has witnessed remarkable advancements in recent years, driven largely by the shift from traditional approaches to state-of-the-art neural network-based algorithms. Among these, Large-scale Language Models (LLMs) have shown remarkable performance across a wide range of tasks and in generating coherent and contextually relevant texts. This improvement, however, comes at the cost of interpretability, since deep neural models offer little transparency about their inner workings and their abilities. This talk will offer an overview of Language Models and the recent advancements achieved in this area, with a specific focus on studies investigating their implicit linguistic abilities and how these insights can enhance our understanding of model behaviour across various tasks and applications. In the second part, we will move to a more practical hands-on session, providing participants with an introduction to how these models can be used and explored in practice.

Best Student Paper Award @ CLiC-it 2025

Our paper ‘Crossword Space: Latent Manifold Learning for Italian Crosswords and beyond’, lead by the PhD student Cristiano Ciaccio, won the CLiC-it 2025 Best Student Paper Award! You can check the paper at the following link: https://clic2025.unica.it/wp-content/uploads/2025/09/25_main_long.pdf.

LM4DH @ RANLP 2025 Invited Talk

Last week I had the pleasure to be invited as speaker at the LM4DH Workshop co-located with RANLP 2025. You can find the slides of my talk here: Slides.

Cruciverb-IT (Shared task at EVALITA 2026)

I am happy to announce that I will be co-organizing a shared task at EVALITA 2026, the evaluation campaign of NLP and Speech Tools for Italian, that will have place in Bari on February 26-27 2026. For more information please visit the shared task web page: Cruciverb-IT: Crossword Solving at EVALITA 2026

CLiC-it 2025 Papers

I’m happy to share that I got three papers acceted at CLiC-it 2025: Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond (Ciaccio C., Sarti G., Miaschi A., Dell'Orletta F.) The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence (Calderaro S., Miaschi A., Dell'Orletta F.) MAIA: a Benchmark for Multimodal AI Assessment (Testa D., Bonetta G., Bernardi R., Bondielli A., Lenci A., Miaschi A., Passaro L., Magnini B.) More info coming soon!

EMNLP 2025 Findings Paper

I’m happy to share that I got a paper accepted at the Findings of EMNLP 2025! The paper, entitled ‘All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark’ (with Testa D., Bonetta G., Bernardi R., Bondielli A., Lenci A., Passaro L. and Magnini B.) introduces MAIA (Multimodal AI Assessment), a native-Italian benchmark designed for fine-grained investigation of the reasoning abilities of visual language models on videos. MAIA differs from other available video benchmarks for its design, its reasoning categories, the metric it uses and the language and culture of the videos. It evaluates Vision Language Models (VLMs) on two aligned tasks: a visual statement verification task, and an open-ended visual question-answering task, both on the same set of video-related questions. It considers twelve reasoning categories that aim to disentangle language and vision relations by highlight when one of two alone encodes sufficient information to solve the tasks, when they are both needed and when the full richness of the short video is essential instead of just a part of it. Thanks to its carefully taught design, it evaluates VLMs’ consistency and visually grounded natural language comprehension and generation simultaneously through an aggregated metric. Last but not least, the video collection has been carefully selected to reflect the Italian culture and the language data are produced by native-speakers. ...

Co-organizing EVALITA 2026

I am happy to announce that I will be co-organizing the 2026 edition of EVALITA. EVALITA is a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. The deadline for submitting your task proposal is July 28th 2026. For more information please visit the EVALITA call for tasks webpage: EVALITA 2026 Call for Tasks

ACL 2025 Papers

I’m happy to share that I got three papers acceted at ACL 2025: one at the main conference and two in the Findings! Main Conference: Evaluating Lexical Proficiency in Neural Language Models (with Ciaccio C. and Dell'Orletta F.) Findings: Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models (with Ciaccio C., Sartor M. and Dell'Orletta F.) Findings: Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors (with Pedrotti A., Papucci M., Ciaccio C., Puccetti G., Dell'Orletta F. and Esuli A.) More info coming soon!

Invited Talk at NLP4RE @ REFSQ 2025 (Barcelona, Spain)

On April 7 I will give an invated talk during the REFSQ 2025 Workshop NLP4RE. Title: Evaluating Linguistic Abilities of Neural Language Models Abstract: The field of Natural Language Processing (NLP) has witnessed remarkable advancements in recent years, driven largely by the shift from traditional approaches to state-of-the-art neural network-based algorithms. Among these, Large-scale Language Models (LLMs) have shown remarkable performance across a wide range of tasks and in generating coherent and contextually relevant texts. This improvement, however, comes at the cost of interpretability, since deep neural models offer little transparency about their inner workings and their abilities. In response, a growing body of research is dedicated to evaluating and interpreting LLMs, aiming to shed light on the inner workings and linguistic abilities encoded by these systems. This talk explores recent studies that shed light on these abilities, highlighting how such insights enhance our understanding of model behaviour across various applications.

Talk at FAIR Spoke Workshop 2025

On February 20 I will give a talk during the FAIR Spoke Workshop 2025 at Sapienza Università di Roma. Title: Controllable Text Generation for Evaluating LLMs' Linguistic Competence Abstract: In this talk, I will provide an overview of the results obtained in the context of the FAIR project (Spoke 5) focused on the evaluation of the linguistic abilities of Large Language Models (LLMs). Specifically, I will highlight results from research on Controllable Text Generation (CTG), with a specific focus on the assessment of LLMs’ abilities to generate text while adhering to specific linguistic constraints.

NAACL 2025 Findings Paper

Our paper ‘Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation’ (with Luca Moroni, Giovanni Puccetti, Pere-Lluís Huguet Cabot, Andrei Stefan Bejgu, Edoardo Barba, Felice Dell’Orletta, Andrea Esuli and Roberto Navigli) has been accepted at NAACL 2025 (Findings)! In this work, we explore various vocabulary adaptation techniques to tailor English LLMs for the Italian language. We introduce Semantic Alignment Vocabulary Adaptation (SAVA), a novel method that learns neural mapping to accomplish vocabulary substitution, which achieve state-of-the-art performances on several downstream tasks. We adapted two LLMs: Mistral-7b-v0.1, reducing token fertility by 25%, and Llama-3.1-8b, optimizing the vocabulary and reducing the number of parameters by 1 billion. We show that, after the adaptation of the vocabulary, these models can recover their performances with a relatively limited stage of continual training on the target language. Finally, we test the adapted models’ capabilities on several multi-choice and generative tasks.

WWW 2025 Paper

Our paper ‘Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation’ (with Lorenzo Cima, Amaury Trujillo, Marco Avvenuti, Felice Dell’Orletta and Stefano Cresci) has been accepted at WWW 2025! In this paper we propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct an LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.

Talk at AI Seminars 2024/25

On December 9 I gave a talk during the AI Seminars of the PhD program in Digital Humanities of the University of Genova. The aim of the seminars is to present experiences and practices of using AI in different fields, to increase knowledge and develop awareness about the opportunities and risks of using AI in our research fields. Title: Evaluating Linguistic Abilities of Neural Language Models Abstract: The field of Natural Language Processing (NLP) has witnessed remarkable advancements in recent years, driven largely by the shift from traditional approaches to state-of-the-art neural network-based algorithms. Among these, Large-scale Language Models (LLMs) have shown remarkable performance across a wide range of tasks and in generating coherent and contextually relevant texts. This improvement, however, comes at the cost of interpretability, since deep neural models offer little transparency about their inner workings and their abilities. In response, a growing body of research is dedicated to analyzing and interpreting LLMs, aiming to shed light on the inner workings and linguistic abilities encoded by these systems. This talk will be divided into two parts. The first part offers an overview of Language Models (LMs) and the recent advancements achieved by these models in the past few years. In the second part, we will focus on recent studies that examine these models’ implicit linguistic abilities, exploring how these insights can enhance our understanding of model behaviour across various tasks and applications.

CLiC-it 2024 Paper

Our paper ‘Controllable Text Generation To Evaluate Linguistic Abilities of Italian LLMs’ (with Cristiano Ciaccio, Felice Dell’Orletta and Giulia Venturi) has been accepted to CLiC-it 2024! In this paper we propose a new evaluation framework leveraging the potentialities of Controllable Text Generation. Our approach evaluates the models’ capacity to generate sentences that adhere to specific linguistic constraints and their ability to recognize the linguistic properties of their own generated sentences, also in terms of consistency with the specified constraints. We tested our approach on six Italian LLMs using various linguistic constraints.

NL4AI 2024 Paper

Our paper ‘Fantastic Labels and Where to Find Them: Attention-Based Label Selection for Text-to-Text Classification’ (with Michele Papucci and Felice Dell’Orletta) has been accepted to NL4AI 2024! In this work, we introduce a novel method for selecting well-performing label representations by leveraging the attention mechanisms of Transformer models. We used an Italian T5 model fine-tuned on a topic classification task, trained on posts extracted from online forums and categorized into 11 classes, to evaluate different label representation selection strategies. We’ve employed a context-mixing score called Value Zeroing to assess each token’s impact to select possible representations from the training set. Our results include a detailed qualitative analysis to identify which label choices most significantly affect classification outcomes, suggesting that using our approach to select label representations can enhance performance.

EMNLP 2024 Paper

Our paper ‘Evaluating Large Language Models via Linguistic Profiling’ (with Felice Dell’Orletta and Giulia Venturi) has been accepted at EMNLP 2024! In this paper, we introduce a novel evaluation methodology designed to test LLMs’ sentence generation abilities under specific linguistic constraints. Drawing on the `linguistic profiling’ approach, we rigorously investigate the extent to which five LLMs of varying sizes, tested in both zero- and few-shot scenarios, effectively adhere to (morpho)syntactic constraints. Our findings shed light on the linguistic proficiency of LLMs, revealing both their capabilities and limitations in generating linguistically-constrained sentences.

New Position

I am happy to announce that I have started a new position as a full-time researcher (RTDA) at CNR-ILC! In this role, I will be contributing to the PNRR FAIR project, which focuses on addressing critical research questions, methodologies, models, and technologies in Artificial Intelligence (AI). Specifically, I am part of Spoke 5 of FAIR, which is focused on the analysis and the developemnt of high-quality AI systems.

LREC-COLING 2024 Paper

Our paper ‘Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It) (with Felice Dell’Orletta and Giulia Venturi) has been accepted at LREC-COLING 2024! In the paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an intermediate task that predicts structural linguistic properties of sentences modifies its performance in the target task of predicting sentence-level complexity. Our study encompasses diverse experiments conducted on Italian and English datasets, employing both monolingual and multilingual T5 models at various sizes. Results obtained for both languages and in cross-lingual configurations show that linguistically motivated intermediate fine-tuning has generally a positive impact on target task performance, especially when applied to smaller models and in scenarios with limited data availability.

Premio di ricerca "Dino Buzzetti" 2023

I am really happy to announce that I was awarded by AIUCD with the 2023 Prize ‘Premio di ricerca Dino Buzzetti’! The prize will fund a small project, based on the development of a book recommender system. In particular, during the project, I will first of all collect reviews from popular Digital Social Reading platforms, such as Goodreads and Anobii. Then, I plan to conduct a comprehensive human evaluation campaign to verify whether (and how) certain reviews are likely to be of interest for future readers. Another aspect of the project involves exploring the capabilities of Language Models (LLMs) to generate fictional reviews and assessing their impact on the overall evaluation campaign. All the informations are available at this link: http://www.aiucd.it/premio-buzzetti-2023-a-alessio-miaschi/.

SANER 2024 Paper

Our paper ‘T-FREX: A Transformer-based Feature Extraction Method from Mobile App Reviews’ (with Quim Motger, Felice Dell’Orletta, Xavier Franch and Jordi Marco) has been accepted at the IEEE International Conference on Software Analysis, Evaluation and Reengineering (SANER) 2024. In the paper we present T-FREX, a Transformer-based, fully automatic approach for mobile app review feature extraction. First, we collect a set of ground truth features from users in a real crowdsourced software recommendation platform and transfer them automatically into a dataset of app reviews. Then, we use this newly created dataset to fine-tune multiple LLMs on a named entity recognition task under different data configurations. We assess the performance of T-FREX with respect to this ground truth, and we complement our analysis by comparing T-FREX with a baseline method from the field. Finally, we assess the quality of new features predicted by T-FREX through an external human evaluation. Results show that T-FREX outperforms on average the traditional syntactic-based method, especially when discovering new features from a domain for which the model has been fine-tuned.