M1 internship - Study of the distance between monogenic and complex diseases using Optimal Transport

 Stage · Stage M1  · 6 mois    Bac+4   BioSTM · Paris (France)

 Date de prise de poste : 2 septembre 2024

Mots-Clés

statistical genetics complex diseases monogenic diseases optimal transport

Description

Description

Context

Genome-wide association studies (GWASs) consist in scanning the genome to identify specific regions (e.g. genetic variants) that are significantly associated with complex traits and diseases. Over the past two decades, countless associations have been identified in the human genome according to the GWAS catalog inventory, 6,763 publications, and 580,109 unique genetic variant-trait associations have been reported as of March 2024. However, in complex diseases, the biological mechanisms underlying the GWAS associations are challenging to establish and links between genes and diseases are mostly unelucidated. A systematic review reported only 309 experimentally validated non-coding GWAS variants Alsheikh et al. (2022). Thus global computational approaches are still desperately needed to elucidate the biological mechanisms behind these associations and are complementary to functional studies.

In contrast to complex diseases, monogenic diseases are caused by mutations generally in a single coding gene, facilitating the elucidation of the link between the mutation and the gene, as well as the biological mechanism. However, connections between monogenic and complex diseases have shown that gene modules from complex diseases are actually related to genes from monogenic disorders with similar phenotypic abnormalities (Blair et al. 2013; Ghiassian et al. 2016; Kafkas et al. 2021; Melamed et al. 2015).

Aim of the internship

The internship will aim to develop a framework to study the link between complex diseases to monogenic diseases based on phenotypic signatures. Thus, given similar phenotypic signatures, the framework could allow to assign genes to complex diseases to identify not only potential “new” genes but also “new” biological mechanisms and shed light on the etiology of complex diseases. The first step will be to find an ingenious way to represent both complex and monogenic diseases in a common phenotype signature space. A second step will involve exploring adapted distances to study the distortion between complex and monogenic diseases. A third step will involve using discrete optimal transport (OT) to map complex to monogenic diseases. Optimality here refers to the shortest path between phenotypic signatures and the lowest cost to “produce” such a phenotypic signature, that is the pleiotropic phenotypic consequences of specific gene perturbations. Notably, discrete OT is based on the computation of a coupling matrix linking the two sets of diseases and more generally the two distributions.  In our case, the coupling matrix could be interpreted from a biological point of view and will allow to “assign” genes from monogenic diseases to complex diseases. Particularly, we will be careful to use robust strategies to estimate it Peyré and Cuturi (2019).

The successful candidate:

  • will be a master 1 student (or césure) of data science linked to statistics or artificial intelligence; candidates with more theoretical background however showing strong interest in life science applications are also welcome;
  • will be enthusiastic about transdisciplinary research and open science at the interface between data science and genetics;
  • will show a clear interest to use applied science methodology to benefit biological understanding;
  • will have good programming skills, preferentially Python;
  • can have a background in biology or genetics;
  • should be open-minded and willing to work as a team with other lab members.

Scientific environment

Starting date: September 2024

The 6-months Master 1 internship will be supervised by Dr Marie Verbanck and Dr Mourad El Hamri who are Assistant Professors (Maîtres de Conférences) at the Faculté de Pharmacie at Université de Paris within the UR 7537 BioSTM unit (Biostatistique, Traitement et Modélisation des données biologiques). BioSTM’s mission is to develop cutting-edge statistical methodologies to answer real-life biological problems with an emphasis on reproducible science and open research.

References related to the internship

Alsheikh, Ammar J., Sabrina Wollenhaupt, Emily A. King, Jonas Reeb, Sujana Ghosh, Lindsay R. Stolzenburg, Saleh Tamim, Jozef Lazar, J. Wade Davis, and Howard J. Jacob. 2022. “The Landscape of GWAS Validation; Systematic Review Identifying 309 Validated Non-Coding Variants Across 130 Human Diseases.” BMC Medical Genomics 15 (1): 74. https://doi.org/10.1186/s12920-022-01216-w.

Blair, David R., Christopher S. Lyttle, Jonathan M. Mortensen, Charles F. Bearden, Anders Boeck Jensen, Hossein Khiabanian, Rachel Melamed, et al. 2013. “A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk.” Cell 155 (1): 70–80. https://doi.org/10.1016/j.cell.2013.08.030.

Ghiassian, Susan Dina, Jörg Menche, Daniel I. Chasman, Franco Giulianini, Ruisheng Wang, Piero Ricchiuto, Masanori Aikawa, et al. 2016. “Endophenotype Network Models: Common Core of Complex Diseases.” Scientific Reports 6 (1): 27414. https://doi.org/10.1038/srep27414.

Kafkas, Şenay, Sara Althubaiti, Georgios V. Gkoutos, Robert Hoehndorf, and Paul N. Schofield. 2021. “Linking Common Human Diseases to Their Phenotypes; Development of a Resource for Human Phenomics.” Journal of Biomedical Semantics 12 (1): 17. https://doi.org/10.1186/s13326-021-00249-x.

Melamed, Rachel D., Kevin J. Emmett, Chioma Madubata, Andrey Rzhetsky, and Raul Rabadan. 2015. “Genetic Similarity Between Cancers and Comorbid Mendelian Diseases Identifies Candidate Driver Genes.” Nature Communications 6 (1): 7033. https://doi.org/10.1038/ncomms8033.

Peyré, Gabriel, and Marco Cuturi. 2019. “Computational Optimal Transport: With Applications to Data Science.” Foundations and Trends in Machine Learning 11 (5-6): 355–607. https://doi.org/10.1561/2200000073.

Candidature

Procédure : To apply, please send a concise email describing your research interests and experience as well as an up-to-date CV to Marie Verbanck and Mourad El Hamri (marie.verbanck@u-paris.fr, mourad.el-hamri@u-paris.fr). Name and contact for references will be appreciated but are not mandatory.

Date limite : None

Contacts

Marie Verbanck

 maNOSPAMrie.verbanck@u-paris.fr

 http://marie.verbanck.free.fr/OTinternship.pdf

Offre publiée le 25 avril 2024, affichage jusqu'au 23 juin 2024