Study of the combination of large language models and knowledge structures for translating SNOMED CT

 Stage · Stage M2  · 6 mois    Bac+5 / Master   Inria Paris, Equipe HeKA et Agence du Numérique en Santé · Paris (France)  gratification de stage

 Date de prise de poste : 1 avril 2024

Mots-Clés

ontologie biomédicale, traduction automatique, LLM, gestion de connaissances

Description

The adoption of SNOMED CT ontology in France pass by its translation to French, which is a qualitative, but time consuming process. We believe that it could be accelerated by the help of NLP and knowledge engineering tools. In particular generative approaches that are used by Large Language Models (LLM) let us think that we can suggest to experts some high quality translations. In addition, we hypothesize that on one hand the use of French clinical texts, which illustrate the use of terms in practice, and on the other hand the available structure (hierarchical and non-hierarchical relationships) of SNOMED CT could improve translations. 

The aim of the internship is to develop one (or several) translation approaches, based on LLM, that suggest : i a principal and extensive translation of preferred terms, ii a set of valid synonyms, iii a full-text unambiguous description of the term. Note that i, ii and iii follow strict edition rules. It is part of the objectives of the internship to formalize these rules and explore how those can be provided to the LLM with prompting. 

 

We would like to empirically evaluate the quality of suggestions, in particular to be able to compare our translation to those provided by an existing solution, but also to be able to improve iteratively the methods we will propose. To this aim, it is a principal objective for the internship to propose strategies that provide an evaluation of our translations. Accordingly, the intern will propose and motivate strategies to objectively compare translations. 

Candidature

Procédure : Par email à adrien.coulet@inria.fr, elisabeth.serrot-damatte@esante.gouv.fr et mael.le-gall@esante.gouv.fr

Date limite : 1 juillet 2024

Contacts

Adrien Coulet

 adNOSPAMrien.coulet@inria.fr

 https://filesender.renater.fr/?s=download&token=70f5963b-7163-4ec5-8037-0723739cb32e

Offre publiée le 15 février 2024, affichage jusqu'au 1 juillet 2024