Assessing the link between the clinical impact of genetic variants and their effect on transcription

 Stage · Stage M2  · 6 mois    Bac+4   CNRS - IGMM/LIRMM · MONTPELLIER (France)

 Date de prise de poste : 1 février 2022

Mots-Clés

statistical learning, regulatory genomics, genetic variants, non-coding RNAs.

Description

Assessing the link between the clinical impact of genetic variants and their effect on transcription initiation at microsatellites.

As part of the international FANTOM consortium, whose aims at better characterizing the human non-coding transcriptome, our team has recently discovered that a significant fraction of Transcription Start Sites, as mapped by cap analysis of gene expression (CAGE) [1-2], initiates at microsatellites [3], also called short tandem repeats (STRs). STRs correspond to repeated DNA motifs of 2 to 6 bp and constitute one of the most polymorphic and abundant repetitive elements [4], with wide impact on gene expression [6–8] through various molecular mechanisms [5].

We trained sequence-based convolutional neural networks (CNNs) able to predict transcription initiation at STRs with high accuracy [3], providing an unprecedented mean to evaluate the impact of genetic variants on this process. On the other hand, we showed that genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level [3], supporting the biological relevance of transcription initiation at STRs in clinics.

The candidate will (i) validate the link between the pathogenicity of genetic variants and their impact on transcription initiation at STRs integrating the output of our models with the ClinVar database [9] and (ii) develop a user-friendly web interface that will facilitate the interrogation of genetic variants observed in patient genomes for clinical routine. The candidate will work in a multidisciplinary team (combining biology, computer sciences and statistics) and in a very active international environment (a CNRS International Associated Laboratory, joined with UBC in Vancouver, Canada, the FANTOM consortium based at RIKEN Yokohama, Japan as well as the SANOFI R&D, Translational Sciences Unit). She/he will have a good knowledge of programming and statistical learning, with an interest in genetics and genomics. This work can be the first step of a PhD project aimed at identifying relevant variations observed in patient genomes, functionally evaluating their impact and unveiling potential links with human diseases. We therefore seek highly motivated students, craving to learn and discover, ready to take the doctoral school exams and to apply to other PhD fundings. Individual qualities such as adaptability, perseverance, creativity and teamwork are expected.

REFERENCES

1. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptionalng point and identification of promoter usage. Proc. Natl. Acad. Sci. U.S.A. 100, 15776– 15781 (2003).

2. Forrest, A. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

3. Grapotte, M. et al. Discovery of widespread transcription initiation at microsatellites pre- dictable by sequence-based deep neural network. Nature communications 12, 1–18 (2021).

4. Willems, T., Gymrek, M., Highnam, G., Mittelman, D. & Erlich, Y. The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014).

5. Bagshaw, A. T. Functional mechanisms of microsatellite dna in eukaryotic genomes. Genome biology and evolution 9, 2428–2443 (2017).

6. Gymrek, M. et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016).

7. Quilez, J. et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and dna methylation in humans. Nucleic acids research 44, 3750–3762 (2016).

8. Jakubosky, D. et al. Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nat. Commun. 11, 2927 (2020). 9. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–868 (2016).


Candidature

Procédure : Envoyer lettre de motivation, CV et références à charles.lecellier@igmm.cnrs.fr

Date limite : 8 octobre 2021

Contacts

Charles LECELLIER

 chNOSPAMarles.lecellier@igmm.cnrs.fr

Offre publiée le 24 septembre 2021, affichage jusqu'au 31 octobre 2021