Publications

Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation

RepL4NLP @ ACL 2020

In this paper we present a comparison between the linguistic knowledge encoded in the internal representations of a contextual Language Model (BERT) and a contextual-independent one (Word2vec). We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that, although BERT is capable of understanding the full context of each word in an input sequence, the implicit knowledge encoded in its aggregated sentence representations is still comparable to that of a contextual-independent model. We also find that BERT is able to encode sentence-level properties even within single-word embeddings, obtaining comparable or even superior results than those obtained with sentence representations.

Tracking the Evolution of Written Language Competence in L2 Spanish Learners

BEA @ ACL 2020

In this paper we present an NLP-based approach for tracking the evolution of written language competence in L2 Spanish learners using a wide range of linguistic features automatically extracted from students’ written productions. Beyond reporting classification results for different scenarios, we explore the connection between the most predictive features and the teaching curriculum, finding that our set of linguistic features often reflect the explicit instructions that students receive during each course.

Prerequisite or Not Prerequisite? That's the problem! An NLP-based Approach for Concept Prerequisite Learning

CLiC-it 2019

This paper presents a method for prerequisite learning classification between educational concepts. The proposed system was developed by adapting a classification algorithm designed for sequencing Learning Objects to the task of ordering concepts from a computer science textbook. In order to apply the system to the new task, for each concept we automatically created a learning unit from the textbook using two criteria based on concept occurrences and burst intervals. Results are promising and suggest that further improvements could highly benefit the results.

Linguistically-Driven Strategy for Concept Prerequisites Learning on Italian

BEA @ ACL 2019

We present a new concept prerequisite learning method for Learning Object (LO) ordering that exploits only linguistic features extracted from textual educational resources. The method was tested in a cross- and in- domain scenario both for Italian and English. Additionally, we performed experiments based on a incremental training strategy to study the impact of the training set size on the classifier performances. The paper also introduces ITA-PREREQ, to the best of our knowledge the first Italian dataset annotated with prerequisite relations between pairs of educational concepts, and describe the automatic strategy devised to build it.

Trattamento Automatico della Lingua per la creazione di percorsi didattici personalizzati

Ital-IA 2019

Il contributo illustra le attività portate avanti dal Laboratorio ItaliaNLP Lab nel contesto dell’educazione, mostrando come strumenti di Trattamento Automatico della Lingua (TAL) per la profilazione linguistica del testo e l’accesso al contenuto permettano di: i) modellare le abilità linguistiche e di valutarne l’evoluzione nel corso dell’apprendimento e ii) supportare la creazione di risorse e percorsi didattici personalizzati rispetto alle competenze degli apprendenti e alle nuove modalità di fruizione anche in contesti di e-learning.

Deep learning for social sensing from tweets

CLiC-it 2015

Distributional Semantic Models (DSM) that represent words as vectors of weights over a high dimensional feature space have proved very effective in representing semantic or syntactic word similarity. For certain tasks however it is important to represent contrasting aspects such as polarity, opposite senses or idiomatic use of words. We present a method for computing discriminative word embeddings can be used in sentiment classification or any other task where one needs to discriminate between con-trasting semantic aspects. We present an experiment in the identification of reports on natural disasters in tweets by means of these embeddings.

Il Codice Pelavicino tra edizione digitale e Public History

Umanistica Digitale

The Codice Pelavicino Digitale Project aims to publish an online digital edition of the relevant manuscript of the XIII century. In this paper features of the edition and related issues are addressed. Secondly we explain motivations for choosing a digital edition as a medium: we address the background, and common concerns in the context of Academy and clerical and historical archives. Finally we give insights on the international standard adopted to markup the text, i.e. XML-TEI, and EVT, a tool adopted to generate the final website and display texts and images.