Genetically-Informed Optimal Transport to advance discoveries and validation in Human genetics

 Stage · Stage M2  · 6 mois    Bac+5 / Master   Institut Curie · Paris 05 (France)  ~600/month

 Date de prise de poste : 1 février 2026

Mots-Clés

statistical genetics machine learning optimal transport complex diseases monogenic diseases ortholog genes validation of genetic findings

Description

Context

Genome-wide association studies (GWASs) consist in scanning the genome to identify specific regions (e.g. genetic variants) that are significantly associated with complex traits and diseases. Over the past two decades, countless associations have been identified in the human genome according to the GWAS catalog inventory, 7,326 publications, and 938,544 unique genetic variant-trait associations have been reported as of July 2025. However, in complex diseases, the biological mechanisms underlying the GWAS associations are challenging to establish and links between genes and diseases are mostly unelucidated. A systematic review reported only 309 experimentally validated non-coding GWAS variants Alsheikh et al. (2022). Thus global computational approaches are still desperately needed to elucidate the biological mechanisms behind these associations and are complementary to functional studies.
An alternative strategy could consist in using knowledge from well-characterized systems to inform and potentially validate genetic findings. Examples include monogenic diseases, model organisms, and structured phenotype ontologies. These resources offer rich information about gene functions and disease mechanisms. The challenge is then to align and compare these heterogeneous sources in a coherent way. For this purpose we propose to use machine learning, namely optimal Transport (OT) which provides a principled framework to compare distributions across domains. It has been applied to alignment tasks where no ground truth exists, such as matching phenotypes or integrating biological data. Its flexibility makes it suitable for comparing disease profiles in the absence of direct supervision.

Aim of the internship

The internship will aim to develop genetically-informed optimal transport (OT) frameworks to support the interpretation of findings in complex diseases in a biologically consistent way Peyré and Cuturi (2019).
The objectives of the internship could include:
• Studying the relationship between complex diseases and Mendelian diseases using phenotypic data and prior knowledge;
• Studying the relationship between Human and ortholog genes from model organisms.
The technological developments could include:
• Expanding the databases that we have already put together;
• Designing and implementing cost functions for optimal transport that reflect the structure of the biological data;
• Integrating genetic prior knowledge in optimal transport frameworks;
• Developing multimodal optimal transport frameworks to integrate multiple sources of data.
By completing these objectives, the intern will contribute to the development of computational tools that support the validation and interpretation of genetic associations through structured phenotype and genotype knowledge.

The successful candidate:

• will be a master 2 student of data science linked to statistics or artificial intelligence; candidates with more theoretical background however showing strong interest in life science applications are also welcome;
• will be enthusiastic about transdisciplinary research and open science at the interface between data science and genetics;
• will show a clear interest to use applied science methodology to benefit biological understanding;
• will have good programming skills, preferentially Python;
• can have a background in biology or genetics;
• should be open-minded and willing to work as a team with other lab members.

Scientific environment

Starting date: February 2026
The 6-months Masters’ internship will be supervised by Dr Marie Verbanck who is Professor of statistical genetics at Institut Curie, and Dr Mourad El Hamri who is Assistant Professor of artificial intelligence at the Faculté de Pharmacie at Université Paris Cité.
This internship could lead to a PhD thesis building on this project focused on genetically-informed optimal transport methods, with applications in disease similarity analysis and integrative validation of complex trait associations.

Candidature

Procédure : To apply, please send a concise email describing your research interests and experience as well as an up-to-date CV to Marie Verbanck and Mourad El Hamri (marie.verbanck@curie.fr, mourad.el-hamri@u-paris.fr). Name and contact for references will be appreciated but are not mandatory.

Date limite : 31 octobre 2025

Contacts

 Marie Verbanck
 maNOSPAMrie.verbanck@curie.fr

 Mourad El Hamri
 moNOSPAMurad.el-hamri@u-paris.fr

 http://marie.verbanck.free.fr/InternPosition_2025.pdf

Offre publiée le 29 juillet 2025, affichage jusqu'au 31 octobre 2025