Mots-Clés
ontologie biomédicale, traduction automatique, LLM, gestion de connaissances
Description
The adoption of SNOMED CT ontology in France pass by its translation to French, which is a qualitative, but time consuming process. We believe that it could be accelerated by the help of NLP and knowledge engineering tools. In particular generative approaches that are used by Large Language Models (LLM) let us think that we can suggest to experts some high quality translations. In addition, we hypothesize that on one hand the use of French clinical texts, which illustrate the use of terms in practice, and on the other hand the available structure (hierarchical and non-hierarchical relationships) of SNOMED CT could improve translations.
The aim of the internship is to develop one (or several) translation approaches, based on LLM, that suggest : i a principal and extensive translation of preferred terms, ii a set of valid synonyms, iii a full-text unambiguous description of the term. Note that i, ii and iii follow strict edition rules. It is part of the objectives of the internship to formalize these rules and explore how those can be provided to the LLM with prompting.
We would like to empirically evaluate the quality of suggestions, in particular to be able to compare our translation to those provided by an existing solution, but also to be able to improve iteratively the methods we will propose. To this aim, it is a principal objective for the internship to propose strategies that provide an evaluation of our translations. Accordingly, the intern will propose and motivate strategies to objectively compare translations.