Crossword Space: Latent Manifold Learning for Italian Crosswords and Beyond

Abstract

Answering crossword puzzle clues presents a challenging retrieval task that requires matching linguistically rich and often ambiguous clues with appropriate solutions. While traditional retrieval-based strategies can commonly be used to address this issue, wordplays and other lateral thinking strategies limit the effectiveness of conventional lexical and semantic approaches. In this work, we address the clue answering task as an information retrieval problem exploiting the potential of encoder-based Transformer models to learn a shared latent space between clues and solutions. In particular, we propose for the first time a collection of siamese and asymmetric dual encoder architectures trained to capture the complex properties and relation characterizing crossword clues and their solutions for the Italian language. After comparing various architectures for this task, we show that the strong retrieval capabilities of these systems extend to neologisms and dictionary terms, suggesting their potential use in linguistic analyses beyond the scope of language games.

Publication
In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025, Cagliari) (upcoming)
Alessio Miaschi
Alessio Miaschi
Full-time researcher (RTD) in Natural Language Processing