PhD in Computational Regulatory Genomics

 Concours · Thèse  · 36 mois    Bac+5 / Master   TAGC Inserm U1090 · Marseille (France)

 Date de prise de poste : 1 octobre 2024

Mots-Clés

Bioinformatics Deep Learning Enhancer Regulatory regions ChIP-seq RNA-seq Machine Learning Data-mining Transcription

Description

Exploration of Intergenic Transcription: Unveiling Regulatory Complexity through Bioinformatics and Machine Learning
Exploration de la Transcription Intergénique : Révéler la Complexité Régulatrice à travers la Bioinformatique et l'Apprentissage Automatique


Summary of thesis project :

Our research focuses on exploring non-coding regions, specifically regulatory elements such as enhancers, which are bound by transcription factors. Understanding the cis-regulatory code of DNA, which dictates gene expression, is a highly explored field with diverse applications in genetics and oncology. While genes and their protein products have traditionally been the primary focus of primary research, there is growing evidence from experimental genomic data supporting the biological and clinical importance of intergenic transcription.

Recently, by compiling omics data (RNA-seq and ChIP-seq) we and others have shown that transcription detected in intergenic regions cover a much larger fraction of the human genome than expected1–3. We created an atlas of intergenic regions which are found to be bound by the RNA Polymerase 21,4. Furthermore we show that transcriptional signals (from reads) can be detected in those intergenic regions in normal tissues and cancer samples. Those regions show enhancer-like characteristics, and some of them can be associated with cancer genes and survival.

We would like to extend the work we initiated by focusing on three aspects:

●  Characterising the transcribed RNA products from those intergenic regions.

●  Extending the identification of intergenic transcription to larger datasets, encompassing more

cell types.

●  Developing new models with enhanced predictive capabilities of non-coding regions.


Objectives:

The objective of the thesis is to investigate the biological significance of intergenic transcription at regulatory regions using omics integration, bioinformatics and machine learning approaches and to provide tools to better interpret the impact of genetic variants associated with this transcription.

Specifically, we aim to:

●  Characterise the transcribed RNA products originating from intergenic regions identified as

bound by RNA Polymerase 2. For this objective, we plan to utilise read assembly methods for short or long transcript reconstruction using high-quality RNA-seq data, such as the GTEx5 and TCGA6 or FANTOM expression catalogues.

●  Expand the identification of intergenic transcription to larger datasets, encompassing a wider variety of cell types. We will leverage the resources of the recount3 project7, which has processed over 316,000 RNA-seq runs in human. This project offers bigwig files at the base pair level, enabling the detection of non-coding signals from underrepresented tissues.

●  Develop advanced predictive models to analyse non-coding regions by incorporating insights from our characterization efforts. Beyond established genome segmentation, such as enhancer/promoter regions, automatically identify new classes of regulatory sequences through integrated analysis of epigenetic data. This involves categorizing regulatory regions based on epigenetic marks and creating tailored predictive models for each category, such as RNA Polymerase 2 ChIP-seq.

The challenge of the thesis lies in the quantity of data to process (“big data”) in conjunction with the complexity of the models to develop. A bioinformatician/computer scientist profile coupled with a strong background in biology with a real appetite for biological complexity would perfectly fit this PhD.

The biological objectives are central in this project, which goes beyond simply cataloguing intergenic regions. Our goal is to employ bioinformatics and computational methods, as a toolbox, to address and learn new aspects of the genome complexity.


Context:

Cette thèse sera financée par une bourse ministère et débutera le 01/10/2024. La thèse se déroulera au TAGC Inserm U1090 sur le campus de Luminy, Marseille, France, et sera supervise par by Benoit Ballester (TAGC, Marseille) et Charles Lecellier (IGMM & LIRMM, Montpellier). Le(la) candidat(e) retenu(e) devra candidater au concours de l’École Doctorale EDSV62 pour obtention d’une bourse de thèse. Ouverture de la candidature 18 Mars 2024, date limite 13 Mai 2204. Date du concours du 19 au 21 Juin.

This thesis will be funded by a Ministry grant and will start on October 1st 2024. The thesis will take place at the TAGC Inserm U1090 on the Luminy campus, Marseille, France, and will be supervised by Benoit Ballester (TAGC, Marseille) and Charles Lecellier (IGMM & LIRMM, Montpellier). The successful candidate will have to apply to the EDSV62 Doctoral School competition for a thesis grant. Opening of applications: March 18, 2024. Deadline: May 13, 2204. Competition dates: June 19th to 21st.


Supervision :

The PhD will be supervised by Benoit Ballester (INSERM, Marseille) and Charles Lecellier (IGMM & LIRMM, Montpellier), localised at TAGC Marseille, with regular visits in Montpellier.

 

How to apply:

Candidate should first apply by email (benoit.ballester@inserm.fr & charles.lecellier@igmm.cnrs.fr) with the following :

● CV

●  Cover Letter

●  Marks (Master 1 and Master 2, as well as current rank)

●  Reports from previous academic placement (e.g. Master 1 placement)

Once pre-selected, only one applicant will be presented at the Doctoral School competition. Deadline for receipt of applications: 13 May 2024
Competition date: June 19-21, 2024
Competition detail : 10min oral presentation, followed by 15min question with the jury (10-14 members).

 

Contact :

Dr. Benoit Ballester, benoit.ballester@inserm.fr, 

Inserm U1090, TAGC Campus de Luminy 13288 Marseille Cedex 9 France. 

Dr Charles Lecellier, charles.lecellier@igmm.cnrs.fr,
Institut de Génétique Moléculaire de Montpellier CNRS-UMR 5535, 1919 Route de Mende, 34293 Montpellier- Cedex 5, France

Candidature

Procédure : How to apply: Candidate should first apply by email (benoit.ballester@inserm.fr & charles.lecellier@igmm.cnrs.fr) with the following : ● CV ●  Cover Letter ●  Marks (Master 1 and Master 2, as well as current rank) ●  Reports from previous academic placement (e.g. Master 1 placement) Once pre-selected, only one applicant will be presented at the Doctoral School competition. Deadline for receipt of applications: 13 May 2024 Competition date: June 19-21, 2024 Competition detail : 10min oral presentation, followed by 15min question with the jury (10-14 members

Date limite : 13 mai 2024

Contacts

Benoit Ballester

 beNOSPAMnoit.ballester@inserm.fr

Offre publiée le 8 avril 2024, affichage jusqu'au 31 mai 2024