2-year postdoctoral position: Modeling genetic variant pleiotropy using machine learning

 CDD · Postdoc  · 24 mois    Bac+8 / Doctorat, Grandes Écoles   UR 7537 — BioSTM Biostatistique, Traitement et Modélisation des données biologiques · Paris (France)  According to experience

 Date de prise de poste : 1 septembre 2022


statistical genetics pleiotropy complex traits and diseases GWAS post-genomic data causal inference Mendelian randomization Gaussian mixture models unsupervised and semi-supervised learning



One striking observation today in the field of human genetics is that as Research advances to understand the genetic architecture of complex traits and to apprehend the etiology of heritable diseases, new paradigms keep emerging revealing more and more of the complexity of biological models. Indeed, the human genome is composed of about 20,000 genes if we consider the coding parts of the DNA, which is hardly more than the worm Caenorhabditis elegans for example. Thus, the complexity of the human organism, i.e. the great diversity of cell types and functions of the organism, must result rather from very high combinatorics and fine-tuned regulations of the expression of these genes. Therefore, mechanically, each genetic element (e.g. variant, gene) is expected to influence several traits. This phenomenon is called pleiotropy.

Although pleiotropy is extremely common and thought to play a central role in the genetic architecture of human complex traits and diseases, it is one of the least understood phenomena.

One of the most compelling lines of evidence supporting pleiotropy is provided by Genome-wide associations studies (GWASs) which consist in estimating the effect of genome-wide genetic variants on a studied trait. GWASs have yielded to the identification of countless genetic variants significantly associated with many complex traits and diseases, most certainly because of pleiotropy, and without being able to pinpoint a causal mechanism in the vast majority of cases. Therefore, many applications and method development have successfully reused the results of GWASs principally to study relationships between traits. One booming field using GWASs summary statistics data is causal inference between traits in the form of Mendelian randomization. The principle of Mendelian randomization is very simple and analogous to randomized control trials where the effects of variant alleles (instead of drug/placebo) are modeled through regression to estimate and test the causal effect of an exposure trait on an outcome trait. Although extremely appealing, Mendelian randomization relies on a strong assumption: the absence of horizontal pleiotropy occurring when a variant has independent effects on both the exposure and the outcome. Pleiotropy tended to be neglected in Mendelian randomization applications. In a stepping-stone paper published in Nature Genetics in 2018, we have shown that horizontal pleiotropy cannot be neglected and occurs in almost 50% of causal relationships, biasing causal estimates and inflating the false discovery rate of causal relationships.

On a related topic, in 2019, we have published a proof-of-concept paper in Genome Biology to, not only detect horizontal pleiotropy, but to show that pleiotropy can be quantified at the level of the genetic variants themselves. We have shown that pleiotropy is widespread across the human genome.

Today we intend to go further, we have conceptualized 5 biological mechanisms leading to pleiotropy 1) linkage disequilibrium; 2) causality between traits; 3) genetic correlation between traits; 4) high polygenicity of traits; 5) horizontal pleiotropy (true independent effects of a variant on two traits). We propose to build a comprehensive framework to disentangle all 5 states of pleiotropy and provide a genome-wide map of pleiotropy for genetic variants and to infer causal relationships between traits using machine learning. Specifically, we propose 1) to improve on a method that we have published the proof-of-concept paper using unsupervised approaches based on penalized methods, random forests or deep learning; 2) to explore semi-supervised learning using a creative strategy to label data that we have developed. There is a growing utility for Human genetic variant databases, from the interpretation of genetic analyses to clinical interpretation. We strongly believe that a database describing the pleiotropic nature of variants will complement existing databases and serve the community. Importantly, the full code of the produced methodology and the genome-wide map of pleiotropy will be made publicly available and highlighted in scientific publications.

References related to the postdoc position

Verbanck, Marie, Chia-Yen Chen, Benjamin Neale, and Ron Do. 2018. “Detection of Widespread Horizontal Pleiotropy in Causal Relationships Inferred from Mendelian Randomization Between Complex Traits and Diseases.” Nature Genetics, April, 1. doi:10.1038/s41588-018-0099-7.

Daniel M. Jordan, Marie Verbanck, and Ron Do. 2019. “HOPS: A Quantitative Score Reveals Pervasive Horizontal Pleiotropy in Human Genetic Variation Is Driven by Extreme Polygenicity of Human Traits and Diseases.” Genome Biology 20 (1): 222. doi:10.1186/s13059-019-1844-7.

Morrison, Jean, Nicholas Knoblauch, Joseph H. Marcus, Matthew Stephens, and Xin He. 2020. “Mendelian Randomization Accounting for Correlated and Uncorrelated Pleiotropic Effects Using Genome-Wide Summary Statistics.” Nature Genetics, May, 1–8. doi:10.1038/s41588-020-0631-4.

Darrous, Liza, Ninon Mounier, and Zoltán Kutalik. 2020. “Simultaneous Estimation of Bi-Directional Causal Effects and Heritable Confounding from GWAS Summary Statistics.” medRxiv, January, 2020.01.27.20018929. doi:10.1101/2020.01.27.20018929.

The successful candidate:

  • will have a PhD of data science linked to statistics or artificial intelligence, candidates with more theoretical background however showing strong interest in life science applications are also welcome;
  • will be enthusiastic about transdisciplinary research and open science at the interface between data science and genetics;
  • will show a clear interest to use applied science methodology to benefit biological understanding;
  • will have good programming skills, preferentially R and/or Python;
  • can have a background in biology or genetics;
  • should be open-minded and willing to work as a team with other lab members;
  • should be will to take part in the supervision of interns and PhD students;
  • will speak decent English since we are closely collaborating with Mount Sinai Hospital in New York City, USA.

Scientific environment

The postdoctoral fellow will work with Dr Marie Verbanck who is Assistant Professor (Maître de Conférences) at the Faculté de Pharmacie at Université Paris Cité within the BioSTM unit (Biostatistique, Traitement et Modélisation des données biologiques - UR 7537). BioSTM's mission is to develop cutting-edge statistical methodologies to answer real-life biological problems with emphasis on reproducible and open research.
The postdoctoral position is part of an ANR-funded (Agence Nationale de la Recherche) project PleioMap lead by Dr Marie Verbanck.


Procédure : To apply, please send a concise email describing you research interests and experience as well as an up-to-date CV to Marie Verbanck (marie.verbanck@u-paris.fr). Please include the names & contact details for two references in your application.

Date limite : None


Marie Verbanck



Offre publiée le 27 avril 2022, affichage jusqu'au 25 juin 2022