Deep learning for genomics

 CDD · Stage M2  · 6 mois    Bac+5 / Master   Museum National d'Histoire Naturelle MNHN · Paris (France)

 Date de prise de poste : 1 janvier 2022


deep learning, gene regulation, epigenomics, DNA sequences motifs


The improvement of DNA sequencing techniques lead to an explosion in the number and completeness of fully sequenced genomes. One of the major goals in the field is to annotate these DNA sequences, which is to associate a biological function with sequence motifs located at different positions along the genome. In parallel to this evolution in genome reading and annotating, algorithmic advances and the use of graphical processing units (GPU) enabled the development and application of deep neural networks in many different contexts. This led to several breakthroughs in domains such as computer vision, speech recognition and machine translation. As a data driven domain, genomics followed this trend and pioneering studies demonstrated the efficiency of deep neural networks to annotate the genome with functional marks directly from the DNA sequence (1). A game changing advantage of these tools is their ability to predict a learned annotation on a variation of the genome, i.e. to predict the effect of mutations. As a first proof of concept of the ability to predict mutations on large scales, we currently develop the mutasome approach, in which all bp of a single genome are mutated individually to see the effect on a given genome annotation (2). If one can successfully predict the effect of mutations, it becomes also possible to design new sequences with controlled properties, a field now known as genome writing. In this project,we wish to use different strategies to leverage the power of deep learning for genome writing. The PhD student will start with the design of nucleosome positioning sequences in yeast Saccharomyces cerevisiae, which can be experimentally tested in collaboration with JB Boulé at MNHN. When validated, interesting follow ups of the project can be envisioned, notably by extending the procedure to the human genome. Our team is involved in an effort of the computing group of the international GP-write consortium (3) which aims at designing synthetic genomes. The recruited student will develop methods, based on deep learning and statistical physics, that will enable the creation of genomic sequences with tailored properties from scratch. (1) Khodabandelou et al., PeerJ Computer science, 2020. (2) Routhier et al., Genome Research, 2020 (3)


Procédure :

Date limite : 1 novembre 2021


Pr Julien Mozziconacci

Offre publiée le 6 septembre 2021, affichage jusqu'au 1 novembre 2021