A selection of scientific publications that present the findings, methodologies, and innovations developed within the E-MIMIC project.

Towards AI-Assisted Inclusive Language Writing in Italian Formal Communications1
Formal communications such as public calls, announcements, or regulations are supposed to exhibit respect for diversity in terms of gender, race, age, and disability. However, human writers often lack adequate inclusive writing skills. For instance, they tend to overuse the masculine as a neutral form, mainly because they are self-trained on biased text examples. To overcome this issue, we propose to leverage Generative Artificial Intelligence to support inclusive language writing. Focusing on formal Italian communications, we have designed and developed an AI-assisted tool for non-inclusive text detection and reformulation. Thanks to the joint work with a team of linguistic experts, we first define a set of linguistic criteria necessary to model inclusive writing forms in Italian. Based on these criteria, we collect and annotate a dataset of Italian administrative documents enriched with fine-grained inclusive annotations. Finally, we train deep learning models on the collected data for non-inclusive language detection and inclusive language reformulation tasks. We perform quantitative and human-driven evaluations on the trained models. The best detection model correctly classifies 89% of the sentences, whereas the best reformulation model produces 73% fully correct reformulations. Both models have been integrated into a writing assistance tool acting as a text proofreader and self-learning tool for non-expert writers, namely Inclusively. Once a non-inclusive piece of text is detected, the proposed approach suggests inclusive reformulations. The tool also provides explanations of the models’ outputs to increase system transparency. Furthermore, it allows expert end-users to provide further annotations for system fine-tuning. The trained models and the writing assistance tool are publicly available for research purposes.

Inclusively: An AI-Based Assistant for Inclusive Writing2
Inclusive writing is compulsory in formal communications. However, employees in private organizations, universities, and ministries often lack inclusive writing skills.
For example, despite Italian grammar having masculine and feminine declensions of words, many official documents have a disrespectful prevalence of the masculine form. To promote inclusive writing practices, we present Inclusively, a language support tool that leverages natural language processing techniques to automatically identify instances of non-inclusive language and suggest more inclusive alternatives.
The tool can be used as a text proofreader and, at the same time, fosters self-learning of inclusive writing forms.
The recorded demo of the tool, available at https://youtu.be/3uiW_ti8wmY, shows how end-users can interact with
Inclusively to feed new data, visualize the non-inclusive pieces of text, explore the list of alternative forms, and provide feedback or human annotations for system fine-tuning

Building Foundations for Inclusiveness through Expert-Annotated Data.3
Natural Language Understanding and Generation models suffer from a limited capability of understanding the nuances of inclusive communication as they are trained on massive data, often including significant portions of non-inclusive content. Even when the models are specifically designed to address non-inclusive language detection or reformulation, they disregard, to a large extent, inclusiveness-related features that are likely correlated with the inclusive language nuances, such as the discourse type, level of inclusiveness, and intended context of use. To assess the importance of additional inclusiveness-related features, we collect a new corpus of Italian administrative documents humanly annotated by linguistic experts.
Linguistic experts not only highlight non-inclusive text snippets and propose possible reformulations, but also annotate multi-aspect labels related to different inclusive language nuances. We empirically show that a multi-task learning approach that leverages the multi-aspect annotations can improve the non-inclusive text reformulation performance, thereby confirming the potential of expert-annotated data in inclusive language processing

L’intelligence artificielle pour préserver le français et l’italien : le projet E-MIMIC.4
Le développement actuel des grands modèles de langage et de l’intelligence artificielle générative dans l’industrie des langues ouvre des questions concernant la préservation du multilinguisme. Il suffit de penser au fait qu’environ 93 % de l’apprentissage de l’un des agents conversationnels les plus utilisés, chatGPT, dans sa troisième version, a été effectué exclusivement à partir de sources anglaises (Brown et al., 2020 : 14).
Au sein de l’Union européenne, nombreux ont été les questionnements à propos de la manière par laquelle l’intelligence artificielle (dorénavant IA) pourrait favoriser ou, vice-versa, mettre en danger le multilinguisme (Raus 2023). L’une des questions les plus débattues concerne les problèmes morphosyntaxiques et sémantiques liés au traitement automatique des langues par l’IA, notamment pour ce qui concerne la féminisation dans les langues romanes (Sofo 2022). Le développement de l’intelligence générative n’a fait que renforcer ce problème lors de la génération de textes de la part d’assistants virtuels comme chatGPT (Gosh & Caliskan 2023).
Afin de mener une réflexion sur ces aspects, nous présentons ici des résultats liminaires bien que prometteurs issus du projet Empowering a Multilingual Inclusive Communication (E-MIMIC). Ce projet, qui a démarré en 2019 sur l’initiative de l’École Polytechnique de Turin et qui, depuis 2022, a obtenu un financement dans le cadre du Plan National de Reprise et Résilience de l’Italie (PNRR) , est né de la volonté de créer un dispositif de reformulation inclusive de textes produits par les administrations publiques.

Inclusione ed elaborazione del linguaggio naturale nell’era dell’intelligenza artificiale generativa.5
Il presente volume è frutto di una riflessione collettiva di autori e autrici di diverse provenienze universitarie che hanno voluto confrontarsi sul tema dell’inclusione, dell’elaborazione del linguaggio naturale e dell’intelligenza artificiale generativa nell’ambito dei lavori di un progetto di rilevante interesse nazionale (PRIN2022)¹,
finanziato nel quadro del Piano italiano di Ripresa e Resilienza (PNRR) e coordinato dal Politecnico di Torino in partenariato con
l’Università di Bologna e l’Università di Roma Tor Vergata.
Nell’era dell’intelligenza artificiale (IA), infatti, temi come l’inclusione e il linguaggio diventano ancora più significativi a causa
dell’impatto linguistico, sociale, politico e non solo che i nuovi dispositivi supportati da tale tipo di tecnologia possono implicare rispetto a una società equa e rispettosa del multilinguismo e di ogni differenza.
Il testo si articola in tre sezioni: nella prima sono stati raccolti dei contributi che aprono delle piste di ricerca sulla base di alcune
delle sfide filosofiche, linguistiche, informatiche e/o sociali poste
dall’IA generativa di ultima generazione. Questa sezione si chiude con la presentazione del suddetto progetto PRIN, il progetto Empowering Multilingual Inclusive Communication (E-MIMIC) che intende riflettere su come l’IA possa salvaguardare il multilinguismo, proponendo modelli alternativi (cfr. anche Crosthwaite, Baisa, 2023) e fornendo l’esempio della creazione di un dispositivo basato su reti
neurali, Inclusively, che riformula i testi amministrativi in chiave inclusiva. Tale saggio introduce la seconda parte del volume, dove sono presentati i lavori dell’
équipe di E-MIMIC, e di persone che
dall’esterno collaborano con essa, per le lingue prese in considerazione dal progetto, ovvero l’italiano d’Italia (cfr. Simone Arcari,
Virginia Laconi), il francese di Francia (cfr. Michela Tonti e Martina Ailén García) e lo spagnolo di Spagna (Natalia Peñín Fernán-
dez). Nella terza parte, infine, sono stati inclusi dei saggi che intendono inaugurare dibattiti e riflessioni di tipo etico, formativo e linguistico sull’IA e sulle forme d’inclusione non solo di genere ma anche linguistico e di comunicazione inclusiva più generalmente
intesa.

The limits of LLMs are the limits of the world”: considerations on the role of linguistics in enhancing AI-generated language quality.6
In this short article I will present some observations that arose from the activities I carried out in 2024 as part of the PRIN 2022 E-MIMIC project.
In the first part, I will briefly describe the increasing focus on inclusive communication and the risks associated with the use of artificial intelligence systems, such as the loss of language diversity and the spread of biases, which I will illustrate with a few examples
from neural machine translation.
In the second part, I will discuss how the flattening of language and the loss of multilingualism caused by AI systems may coincide with the loss of the ability to conceptualize the world differently
from dominant models. Observation of language acquisition and development in children reveals the social conditioning that permeates a language and the role of social experience in the comprehension process, a factor that distinguishes human competence from artificial language production.
Linguistics proves to be of central importance in understanding the complexity of human language, as well as the need for strong human involvement in data-driven methods, as is being done in the E-MIMIC project whose innovative deep-learning methodology aims to develop artificial intelligence systems that can serve to promote equality in communication.

Intelligence artificielle corpus et diversité linguistique : enjeux et perspective.7
Les évolutions récentes dans l’« apprentissage profond » (deep learning), par lequel un dispositif basé sur des réseaux neuronaux imitant ceux du cerveau humain apprend par le biais des données (Le Cun 2019), ont permis à l’intelligence artificielle (désormais IA) de connaître un essor considérable dans tous les domaines des activités humaines et de la recherche (e.a. Zouinar 2020 ; von Braun et al. 2021). Dans le domaine linguistique (Tavosanis 2018), l’IA a trouvé son application majeure dans l’industrie des langues à des fins diverses, comme, par exemple, la traduction (Koehn 2020) ou l’apprentissage des langues (Miras et al. 2019). L’avènement de l’intelligence générative, capable de générer des textes, images ou vidéos, après le lancement de ChatGPT en novembre 2022, n’a fait que contribuer à sa diffusion.
Toutefois, ces évolutions soulèvent de nombreuses questions, parmi lesquelles celles-ci : quel est l’impact de l’anglais sur les multilinguismes lors de l’apprentissage profond des dispositifs linguistiques (Kim et al. 2019 ; Vetere 2022) – tels que les dispositifs d’écriture, d’interprétation, de traduction, de doublage, etc. ?
quels sont les enjeux liés à l’apprentissage non supervisé – c’est-à-dire sans intervention humaine – de la machine, les corpus utilisés pour l’apprentissage pouvant contribuer à la diffusion de biais (Bartoletti 2020) ?
Ces deux questions et bien d’autres découlent des approches et des méthodes utilisées dans la recherche sur l’IA…

L’analyse du discours et l’intelligence artificielle pour réaliser une écriture inclusive : le projet EMIMIC.8
Cet article présente le projet E-MIMIC, une application qui vise
à éliminer les préjugés et la non-inclusion dans les textes administratifs
rédigés dans les pays européens, à commencer par ceux qui sont rédigés
dans les langues romanes. Il présente une méthodologie conçue à partir de
critères discursifs inspirés de l’analyse du discours française et utilisés
pour étiqueter un corpus de documents institutionnels, qui sont utilisés
pour l’apprentissage profond des réseaux neuronaux. Des architectures de
modélisation profonde du langage sont exploitées pour identifier
automatiquement les extraits de texte non inclusifs, suggérer des formes
alternatives et produire des reformulations inclusives. Une évaluation
préliminaire menée sur un ensemble de données de référence pour la
langue italienne montre des résultats prometteurs, qui poussent à finaliser
l’application et à la réaliser également pour d’autres langues, tel le français.

References
- Salvatore Greco, Moreno La Quatra, Luca Cagliero, Tania Cerquitelli (2025): Towards AI-Assisted Inclusive Language Writing in Italian Formal Communications. In ACM Transactions on Intelligent Systems and Technology, Volume 16, Issue 4. Article No.: 79, Pages 1 – 24 https://doi.org/10.1145/3729237. Published: 10 June 2025. ↩︎
- Moreno La Quatra, Salvatore Greco, Luca Cagliero, Tania Cerquitelli (2023): Inclusively: An AI-Based Assistant for Inclusive Writing. ECML/PKDD (7) 2023: 361-365 ↩︎
- Moreno La Quatra, Salvatore Greco, Luca Cagliero, Michela Tonti, Francesca Dragotto, Rachele Raus, Stefania Cavagnoli, Tania Cerquitelli (2024): Building Foundations for Inclusiveness through Expert-Annotated Data. EDBT/ICDT Workshops 2024. ↩︎
- Rachele Raus, Michela Tonti L’intelligence artificielle pour préserver le français et l’italien : le projet E-MIMIC, Langages, n°237, 2025, pp. 21-41. ↩︎
- Rachele Raus, Inclusione ed elaborazione del linguaggio naturale nell’era dell’intelligenza artificiale generativa – Introduzione, AA. VV., Inclusione ed elaborazione del linguaggio naturale nell’era dell’intelligenza artificiale generativa, 2025, pp. 7-15. ↩︎
- Valeria Zotti, “The limits of LLMs are the limits of the world”: considerations on the role of linguistics in enhancing AI-generated language quality, in AA. VV., Inclusione ed elaborazione del linguaggio naturale nell’era dell’intelligenza artificiale generativa, 2025, pp. 201-210. ↩︎
- Rachele, Raus; Michela, Tonti, Langages, Numero monografico Intelligence artificielle, corpus et diversité linguistique : enjeux et perspectives, n°237, 2025, pp. 7-20. ↩︎
- Raus, Rachele; Tonti, Michela; Cerquitelli, Tania; Cagliero, Luca; Attanasio, Giuseppe; La Quatra, Moreno; Greco, Salvatore, L’analyse du discours et l’intelligence artificielle pour réaliser une écriture inclusive : le projet E- MIMIC, in: 8e Congrès Mondial de Linguistique Française, Les Ulis, EDP Sciences, 2022, 138, pp. 1 – 15 ↩︎
