Source Themes

All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

We introduce MAIA (Multimodal AI Assessment), a native-Italian benchmark designed for fine-grained investigation of the reasoning abilities of visual language models on videos. MAIA differs from other available video benchmarks for its design, its …

Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond

Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this …

MAIA: a Benchmark for Multimodal AI Assessment

We introduce MAIA (Multimodal AI Assessment), a multimodal dataset developed as a core component of a competence-oriented benchmark designed for fine-grained investigation of the reasoning abilities of Visual Language Models (VLMs) on videos. The …

The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence

Recent progress in Large Language Models (LLMs) has led to impressive capabilities in Natural Language Generation (NLG). However, standard evaluation benchmarks often focus on surface-level performance and are predominantly English-centric, limiting …

All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond

MAIA: a Benchmark for Multimodal AI Assessment

The OuLiBench Benchmark: Formal Constraints as a Lens into LLM Linguistic Competence

Beyond the Spelling Miracle: Investigating Substring Awareness in Character-Blind Language Models

Evaluating Lexical Proficiency in Neural Language Models

Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

Parallel Trees: a novel resource with aligned dependency and constituency syntactic representations

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

Leveraging encoder-only large language models for mobile app review feature extraction