Postdoc in Deep Learning for Precision Medicine from genomics data

 CDD · Postdoc  · 24 mois    Bac+8 / Doctorat, Grandes Écoles   UMMISCO · Paris (France)

Mots-Clés

Deep Learning Bioinformatics Artificial Intelligence Intelligence Artificielle genomics génomique metagenomics microbiome microbiote microbiota Machine Learning métagénomique

Description

2 Post-doc positions

Deep Learning for Precision Medicine from genomics data

We propose two post-doc positions (each for 2-years) on the development of new deep learning architectures adapted to prediction from genomics data. The first position is about transformers to deal with a large amount of DNA reads from meta-genomics data. The second position consists of integrating the cost of the data acquisition into a multi-modal architecture integrating different sources of genomics data. The two approaches will be applied on the prediction of metabolomics diseases.

The successful candidate will have the opportunity to draft and publish several scientific manuscripts based on his/her results, and to significantly expand his/her experse, network and track record. The initial employment contract will be for 2 years with the possibility to prolong and apply for a permanent position.

Context:

Deep learning (DL) has brought up a radical change in the field of pattern recognition and ML itself, improving most of earlier models devoted to learning tasks such as image classification and natural language processing. DL can be applied directly to raw metagenomics data generated by high throughput shotgun sequencing. By analogy with natural language processing (NLP), sequencing reads can be seen as sentences and k-mers as words. There is a growing number of studies applying DL techniques to explore various biological processes with promising results.

The DeepIntegrOmics (2022-2026) ANR project’s main scienfic goal is to significantly improve DL-based methodological frameworks using multi-Omics data for Precision Medicine in two main directions : first to support both reliable end-to-end prediction from metagenomics raw-data and second to improve classification accuracy and strafication by integrating other omics data. Two more applied objectives are to propose novel approaches for multi-omics biomarker idenfication of cardiometabolic disease stages and propose means of patient stratificaon through the interpretation of these neural network architectures.

From a methodological perspective, the expected result is both a DL architecture for cost-sensitive data integration and open sourced embeddings to perform multi-omics classification. The applicants will take part in this ANR Project and will join the DeepIntegrOmics Consortium.

Subject 1 : Deep Learning for Metagenomics data (YC)

Location: 91 bvd Hôpital, 75013 Paris (Sorbonne University Campus at Pitié Salpêtrière)

The main scientific goal of this doctoral project is to significantly improve DL methodological framework using metagenomics data based on transformers architecture and other state of the art DL architectures. Metagenomic data can be seen as bags of DNA sequences from different species. We will start to work with pre-learned DNA embeddings. The main assumption which will be explored is that recent text embedding obtained via Transformers could be translated into DNA embedding in order to get bags of species embeddings. The final step is to obtain a metagenomic embedding by aggregating the species embeddings. To achieve this, DeepSets-like architectures will be explored. This will allow us to address research issues regarding a DL architecture for end-to-end disease prediction directly from microbiome raw sequenced data (such as Nanopore).

The approach will be validated on both real and simulated data. An application will be to predict cardiometabolic disease (CMD) stages and progression from a uniquely phenotyped database (one of the largest existing dataset from the EU H2020 project called Metacardis). Altogether these objectives aim to support translational and precision medicine (i.e. classification into disease groups) while deploying the models for roune use in the nutrition department of the Pitié Salpêtrière Hospital. The post-doc will be co-supervised by Jean-Daniel Zucker (UMMISCO-IRD/SU and NUTRIOMICS-INSERM/SU) and Yann Chevaleyre (LAMSADE - University Paris-Dauphine).

Subject 2 : Deep cascade models for genomics data

Location: Lab IBISC, University Paris-Saclay (Univ. Evry)

The objective is to develop a new method for the adaptive and cost-sensitive integration of multi-modal data through a deep cascade model.

In a medical context, each data source (omics, clinical, metagenomics,...) provides a complementary view of the patient, predictions based on all data sources will be more reliable than single-source predictions. The main rationale for this project is to both improve the integration of data sources and take into account their cost. Indeed, for practical use in clinical contexts, the predictive models must both maximize prediction accuracy while minimizing their cost. The cost can represent financial resources but also me, secondary effects for patients, or any other finite resources.

The solution will be to exploit the fact that some patients are easier to predict than others and do not need all data sources. This solution will be based on a deep cascade model. At each iteration of this procedure, we must decide which data source to add in the model and how to integrate it. This approach raises several scientific problems: extraction of relevant information through embeddings for each source, selection of the data source set, integration of different sources into the hidden layers of the neural network, and the final prediction sensitive to both accuracy and cost. An end-to-end learning approach will be used to fit the parameters of all components of the cascade (data source selection, integration, prediction).

The post-doc will be supervised by Blaise Hanczar (IBISC lab, Université Paris-Saclay, Univ. Evry)

Profile

We are seeking candidates with PhD degree in Computer science or applied mathematics, and :

o Strong background in machine/deep learning;

o Publications at peer-reviewed AI conferences (e.g. NeurIPS, CVPR, ICML, ICLR, AAAI, …)

o Good knowledge in programming and machine/deep learning framework (Python, scikit-learn, Pytorch);

o Experience in bioinformatics is a plus

To apply

Qualified candidates should email as soon as possible a cover letter, curriculum vitae, and a list of three references to Yann Chevaleyre and Jean-Daniel Zucker for Subject 1 and Blaise Hanczar for Subject 2 with “DeepIntegromics application” in the subject of the email to deepintregromics@gmail.com


 

Candidature

Procédure : Qualified candidates should email as soon as possible a cover letter, curriculum vitae, and a list of three references to Yann Chevaleyre and Jean-Daniel Zucker for Subject 1 and Blaise Hanczar for Subject 2 with “DeepIntegromics application” in the subject of the email to deepintregromics@gmail.com

Date limite : None

Contacts

Yann Chevaleyre, Jean-Daniel Zucker and Blaise Hanczar

 deNOSPAMepintegromics@gmail.com

Offre publiée le 24 novembre 2022, affichage jusqu'au 30 juin 2023