Source Themes

All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

We introduce MAIA (Multimodal AI Assessment), a native-Italian benchmark designed for fine-grained investigation of the reasoning abilities of visual language models on videos. MAIA differs from other available video benchmarks for its design, its …

Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond

Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this …

MAIA: a Benchmark for Multimodal AI Assessment

We introduce MAIA (Multimodal AI Assessment), a multimodal dataset developed as a core component of a competence-oriented benchmark designed for fine-grained investigation of the reasoning abilities of Visual Language Models (VLMs) on videos. The …

The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence

Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting …

Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models

Correctly identifying characters and substrings of words should be a basic but essential ability of any Language Model that aims to proficiently understand and produce language. Despite so, the majority of Pre-trained Language Models (PLMs) are …

Evaluating Lexical Proficiency in Neural Language Models

We present a novel evaluation framework designed to assess the lexical proficiency and linguistic creativity of Transformer-based Language Models (LMs). We validate the framework by analyzing the performance of a set of LMs of different sizes, in …

Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, …

Parallel Trees: a novel resource with aligned dependency and constituency syntactic representations

The paper introduces Parallel Trees, a novel multilingual treebank collection that includes 20 treebanks for 10 languages. The distinguishing property of this resource is that the sentences of each language are annotated using two syntactic …

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

An increasing number of pretrained Large Language Models (LLMs) are being released, though the majority are predominantly designed for English. While they can often handle other languages due to contamination or some degree of multilingual …

Leveraging encoder-only large language models for mobile app review feature extraction

Mobile app review analysis presents unique challenges due to the low quality, subjective bias, and noisy content of user-generated documents. Extracting features from these reviews is essential for tasks such as feature prioritization and sentiment …