PhD in Machine Learning and causal inference for imaging and multi-omics data
CDD · Thèse · 36 mois Bac+5 / Master Institut Curie, CNRS-UMR168 · Paris (France)
Date de prise de poste : 1 octobre 2022
Machine and Deep learning Causal inference Information theory Live cell imaging Single cell multi-omics
PhD Funding: CNRS-Imperial College joint PhD Program
Thesis supervisor: Hervé Isambert (DR, CNRS)
Research team: Reconstruction, Analysis and Evolution of Biological Networks
Research Institute: Institut Curie, CNRS-UMR168, Paris
Live cell imaging microscopy and next generation sequencing technologies, now routinely used in cell biology labs, produce massive amounts of time-lapse images and gene expression data at single cell resolution. However, this wealth of state-of-the-art biological data remain largely under-explored due to the lack of unsupervised methods and tools to analyze them without preconceived hypothesis. This highlights the need to develop new Machine Learning and Artificial Intelligence strategies to better exploit the richness and complexity of the information contained in time-resolved cell biology data.
The Isambert lab recently developed novel causal inference methods and tools (https://miic.curie.fr) to learn cause-effect relationships in a variety of biological or clinical datasets, from single-cell transcriptomic and genomic alteration data (Verny et al 2017, Sella et al 2018, Desterke et al 2020) to medical records of patients (Cabeli et al 2020, Sella et al 2022, Ribeiro Dantas et al 2022) These machine learning methods combine multivariate information analysis with interpretable graphical models (Li et al 2019, Cabeli et al 2021, Ribeiro Dantas et al 2022) and outperform other methods on a broad range of benchmarks, achieving better results with only ten to hundred times fewer samples.
The first objective of the present PhD project is to extend these causal inference methods to analyze time-resolved cell biology data, for which the information about cellular dynamics can facilitate the discovery of novel cause-effect functional processes. The second objective, in collaboration with Barbara Bravi’s team at Imperial College London, will be to parametrize reconstructed causal networks in order 1- to predict the course of a disease from early temporal information and 2- to generate synthetic temporal data, which will then be used to improve causal inference methods through an iterative ‘adversarial’ model training approach.
These novel advanced causal inference methods for time series data will then be applied to analyze two types of high-through put time-resolved cell biology data: 1- time lapse images of i) tumour-on-chip cellular ecosystems in collaboration with Maria Carla Parrini (Inst Curie) and ii) differentiating hematopoietic stem cells in collaboration with Leïla Perié (Inst Curie) and 2- single-cell transcriptomic data of i) breast cancer under treatment in vitro in collaboration with Luca Magnani (Imperial College) and ii) differentiating hematopoietic stem cells from the Perié lab (Inst Curie).
1. Sella N, Hamy AS, Cabeli V, Darrigues L, Laé M, Reyal F, Isambert H: Interactive exploration of a global clinical network from a large breast cancer cohort. Npj Digital Med, under minor revision.
2. Cabeli V, Li H, Ribeiro-Dantas M, Simon F, Isambert H: Reliable causal discovery based on mutual information supremum principle for finite dataset. Why21 @ NeurIPS 2021 (2021).
3. Cabeli V, Verny L, Sella N, Uguzzoni U, Verny M, Isambert H: Learning clinical networks from medical records based on information estimates in mixed-type data. PLoS Comput Biol 16(5):e1007866 (2020).
4. Desterke C, Petit L, Sella S, Chevallier N, Cabeli V, Coquelin L, Durand C, Oostendorp RAJ, Isambert H, Jaffredo T, Charbord P: Inferring gene networks in bone marrow Hematopoietic Stem Cell-supporting stromal niche populations. iScience 23(6):101222 (2020).
5. Li H, Cabeli V, Sella N, Isambert H: Constraint-based causal structure learning with consistent separating sets. Advances in Neural Information Processing Systems (NeurIPS) 32, 14257-14266 (2019).
6. Sella N, Verny L, Uguzzoni G, Affeldt S, Isambert H: MIIC online: a web server to reconstruct causal or non-causal networks from non-perturbative data. Bioinformatics 34(13):2311-2313 (2018).
7. Verny L, Sella N, Affeldt S, Singh PP, Isambert H: Learning causal networks with latent variables from multivariate information in genomic data. PLoS Comput Biol 13(10):e1005662 (2017).
Expected profile of the candidate
Applicants should have a strong background in machine learning or computer science and a keen interest to analyze complex heterogeneous data of biological and medical interests. Applicants should be proficient in programming and willing to interact with scientists from different disciplines, from data scientists to medical doctors. Applicants are expected to show a clear capacity for independent and creative thinking. Experience on causal inference analysis is a plus but not required as long as the applicant has a strong motivation to learn.
Procédure : Please send complete CV, Master’s transcripts with marks and the name(s) of one or more references to firstname.lastname@example.org Informal inquiries are welcome. Starting date: Fall 2022, the position will be open until filled.
Date limite : None
Offre publiée le 27 avril 2022, affichage jusqu'au 25 juin 2022