Étude des éléments régulateurs de l’expression des gènes chez l’humain

Informations générales
Détails de la thèse/HDR
Cédric Notredame
Thérèse COMMES
Raphaël MOURAD
Jean-Christophe ANDRAU
Directeur (pour les thèses)
Charles-Henri LECELLIER
Résumé en anglais
Genome expression is tightly controlled by different
regulatory regions to provide a wide variety of cell types and
functions. Identifying these regulatory regions, their characteristics
and understand how they interact with each other in a tissue-specific
manner is prime importance. This knowledge should help better
understand the impact of genomic variants often located in non-coding
regions. Besides, cancer development is invariably linked to
deregulation of gene expression controls. To pave the way for targeted
treatments and precision medicine, it is important to understand how
all this machinery is orchestrated. To answer this question, several
approaches were developed, most of them based on experimental data of
histone modification, methylation and transcription factors (TFs).
However, these data are limited to specific samples and cannot be
generated for all the regulators and all the patients. First, my
thesis research aimed at modeling gene expression based on DNA
sequence only. We used a linear model with variable selection,
equivalent in term of performances with non-parametric methods and
easy to interpret. This model allowed me to compare several types of
variables based on the DNA sequence, as TFs binding motifs and
nucleotide composition. These variables are computed for various gene
regions to estimate their regulatory power and contribution.
Strikingly, introns, for which nucleotide composition reflects gene
environment, appear to explain an important part of gene expression
variation. Furthermore, we demonstrated that the topological domains
(TADs), in which interactions are favored, share similar genomic
compositions. Our prediction model presumably captures, for every
individual, the composition of active TADs. A second aspect of my work
studied the regulations occurring in introns. The international FANTOM
consortium provided one of the most important transcription start
sites (TSSs) atlas and we noticed that the majority of these TSSs are
detected into non-coding regions, in particular introns. We thus
investigated these intronic TSSs. To determine if these TSSs are
functional, we searched for new potential regulatory motifs at the
vicinity of these transcription signals. We found that a fraction of
them is located 2 bases downstream of a repetition of Ts. Biochemical
and genetic evidences suggest that at least part of these signals
correspond to sense-intronic long non-coding RNAs, which are expressed
in a tissue specific manner. The length of the T repetition also
appears to govern the presence of a transcription signal at these loci
and indirectly impact on host gene expression. These findings provide
one possible molecular explanation for the effect of these short
tandem repeats of Ts.