PhD in Computational Regulatory Genomics

 Concours · Thèse  · 36 mois    Bac+5 / Master   TAGC INSERM1090 · Marseille (France)

 Date de prise de poste : 1 octobre 2022

Mots-Clés

Bioinformatics, Deep Learning, Enhancer, Regulatory regions, ChIP-seq, RNA-seq, Machine Learning, Data-mining

Description

[EN] Characterisation of gene and exonic regulatory elements by multi-omics integration and machine learning

 

[FR] Caractérisation des éléments régulateurs génique et exonique par intégration multi-omiques et apprentissage automatique

 

 

Summary of thesis project :

Our work focuses on the study of non-coding regions such as regulatory regions (enhancers) bound by transcription factors. High-throughput sequencing techniques (ChIP-seq) have made it possible to identify these regulatory regions, and the importance of these regions formerly known as "junk dna". Since 2015 we have been implementing the ReMap project to characterize these regions using ChIP-seq data. ReMap was the first large-scale integrative initiative to process public data and reveal the complex architecture of the human regulatory landscape (ReMap: http://remap.univ-amu.fr/)

These regulatory regions are located both in intergenic and genic regions. Although intergenic regulatory regions are widely studied, fewer studies have focused on regulatory regions within gene bodies and on coding regions (exons).  Studies (1–3), as well as our ReMap (4) data, reveal that transcription factors bind in the protein coding regions (exons) of a large number of human genes.

 

Objectives:

The objective of the thesis is to study the characteristics and specificities of gene and exon regulatory elements using omics integration and machine learning approaches. In other words, define the spectrum of regulatory elements within protein-coding regions and gene bodies. How widespread is the phenomenon of "regulatory" codes (regulatory regions) that overlap the genetic code (exon)?

From this, many questions arise: 

  • Do these exonic regulatory regions play a role in regulating gene transcription ?
  • Do they affect splicing ?
  • Do they target specific genes ?
  • Do exonic regulatory regions constrain the evolution of protein sequences ?
  • What are the features that could allow the coexistence of regulatory (non-coding) and genetic (coding) codes ?
  • Can we use Machine or Deep Learning approaches to highlight these features of regulatory elements within protein-coding regions ?

In a first step, the student will develop both "classical" and machine learning analyses to classify and characterise gene and exonic regulatory elements.  In a second step, the student could perform a pan-cancer analysis to evaluate the effect of identified SNPs on these regulatory coding regions. Indeed, this new spectrum of regulatory elements within coding regions must be taken into account when assessing the impact of variants (SNPs for pathological mutations) from exome sequencing (only protein coding regions of the genome) and cancer genome studies.

The approaches developed by the student will be applied in parallel to other data within the team, and on other species (Mouse, Drosophila, Plant). The challenge of the thesis lies in the quantity of data to be processed (Big Data) in conjunction with the complexity of the genome. The biological objectives are central in this project, which goes beyond simply cataloguing regions. Our goal is to employ bioinformatics and computational methods, as a toolbox, to address and learn new aspects of the genome complexity.

A bioinformatician/computer scientist profile coupled with a strong background in biology with a real appetite for biological complexity would perfectly fit this PhD.

 

Context :

[FR] Cette thèse sera financée par une bourse ministère et débutera le 01/10/2022. La thèse se déroulera au TAGC Inserm U1090 sur le campus de Luminy, Marseille, France.  Le(la) candidat(e) retenu(e) devra candidater au concours de l’École Doctorale EDSV62 pour obtention d’une bourse de thèse.

This thesis will be funded by a Ministry grant and will start on October 1st 2022. The thesis will take place at the TAGC Inserm U1090 on the Luminy campus, Marseille, France, and will be supervised by Benoit Ballester (INSERM).  The successful candidate will have to apply to the EDSV62 Doctoral School competition for a thesis grant.

Supervision :

The PhD will be supervised by Benoit Ballester (INSERM, Linkedin), in a human sized team with good spirit. PhD students have direct access to the supervisor, and are encouraged to interact and work together.

Material scientific conditions :

The work will be performed on the High-Performance Computing (HPC) cluster of the Aix-Marseille University site and/or on the HPC cluster of the IFB.

Profile and skills required :

We welcome applications from enthusiastic, focussed, and ambitious students.

Applicants should hold a Master's degree in bioinformatics/computational science with a strong background in genomics and an appetite for the biological objectives. We require applicants to hold, or be about to obtain, a First or upper Second class Honours degree (for Non-French Masters), or be in the top tier of their Masters (Mention AB minimum).

The candidate should have good programming skills in Python and R. He/She should be familiar with dimensionality reduction approaches, Deep/Machine Learning methods, and use the necessary libraries (eg: SK Learn, Keras). 

Experience working on HPC is desirable (SLURM), and with GIT repositories.

Experience working with a workflow management system (Snakemake / Nextflow) is a plus.

This project focuses on both computational biology and genomics aspects. The student will benefit from the extensive bioinformatics expertise within the laboratory, ranging from NGS data analysis to protein-protein interaction networks. The work involves large-scale data analysis on HPC.

 

How to apply:

Candidate should first apply by email (benoit.ballester@inserm.fr) with the following :

  • CV
  • Cover Letter
  • Marks (Master 1 and Master 2, as well as current rank)
  • Reports from previous academic placement (e.g. Master 1 placement)

Deadline for receipt of applications: 16 May 2022

Competition date: June 8-10, 2022

Competition detail : 10min oral presentation, followed by 15min question with the jury (10-14 members).

 

Selection process :

Only one applicant will be presented at the Doctoral School competition.

After interviews, we will select one PhD candidate, who will then apply to the EDSV62 Doctoral School competition on June 8-10, 2022 for a thesis grant from the Doctoral School. The selected candidate will be trained and prepared for this audition in order to maximize the chance of success.  If the audition is successfull, this thesis will be funded by a Ministry grant for 3 years and will start on October 1st 2022. The thesis will take place at the TAGC Inserm U1090 on the Luminy campus, Marseille, France, and will be supervised by Benoit Ballester (INSERM). 

Details of competition/audition process :

  • 10min oral presentation (Self presentation, Master work research, and presentation of the PhD objectives and research plan for teh next 3 years)
  • 15min question with the jury (10-14 members).

 

Contact :

Dr. Benoit Ballester

Inserm U1090, TAGC

Campus de Luminy

13288 Marseille Cedex 9

France

Phone : +33 4 91 82 87 28

 

Email :  benoit.ballester@inserm.fr

Email :  benoit.ballester@univ-amu.fr

 

 

References :

1. Stergachis,A.B., Haugen,E., Shafer,A., Fu,W., Vernot,B., Reynolds,A., Raubitschek,A., Ziegler,S., LeProust,E.M., Akey,J.M., et al. (2013) Exonic transcription factor binding directs codon choice and impacts protein evolution. Science (New York, N.Y.), 342, 1367.

2. Khan,A.H., Lin,A. and Smith,D.J. (2012) Discovery and characterization of human exonic transcriptional regulatory elements. PLoS One, 7, e46098.

3. Hyder,S.M., Nawaz,Z., Chiappetta,C., Yokoyama,K. and Stancel,G.M. (1995) The protooncogene c-jun contains an unusual estrogen-inducible enhancer within the coding sequence. J Biol Chem, 270, 8506–8513.

4. Hammal,F., de Langen,P., Bergon,A., Lopez,F. and Ballester,B. (2022) ReMap 2022: a database of Human, Mouse, Drosophila and Arabidopsis regulatory regions from an integrative analysis of DNA-binding sequencing experiments. Nucleic Acids Research, 50, D316–D325.

5. Sullivan,A.M., Arsovski,A.A., Lempe,J., Bubb,K.L., Weirauch,M.T., Sabo,P.J., Sandstrom,R., Thurman,R.E., Neph,S., Reynolds,A.P., et al. (2014) Mapping and Dynamics of Regulatory DNA and Transcription Factor Networks in A. thaliana. Cell Reports, 8, 2015–2030.

 

 

Candidature

Procédure : Candidate should first apply by email (benoit.ballester@inserm.fr) with the following : • CV • Cover Letter • Marks (Master 1 and Master 2, as well as current rank) • Reports from previous academic placement (e.g. Master 1 placement) Only one applicant will be presented at the Doctoral School competition. Deadline for receipt of applications: 16 May 2022 Competition date: June 8-10, 2022 Competition detail : 10min oral presentation, followed by 15min question with the jury (10-14 members).

Date limite : 30 septembre 2025

Contacts

Benoit Ballester

 beNOSPAMnoit.ballester@inserm.fr

Offre publiée le 18 février 2022, affichage jusqu'au 16 mai 2022