Route de Saint Cyr
Genomics-assisted characterization of plant viruses
The genomes of eukaryotes, including plants, fungi, animals or protists can contain integrated viral sequences that can be retained there over extended periods of time, sometimes millions of years. The study of such endogenous viral elements (EVEs), coined "paleovirology", does allow the evolution of viruses to be traced, much like a fossil record (Aiewsakun and Katzourakis, 2015). The Caulimoviridae is one of the five families of reverse-transcribing viruses or virus-like retrotransposons that occur in eukaryotes (Pringle, 1998), and is the only family of viruses with a double-stranded DNA genome that infects plants. Unlike retroviruses, Caulimoviridae do not integrate their DNA in the genome of their host to complete their replication cycle. Nevertheless, caulimovirid DNA can occasionally integrate their host genome passively. In a recent phylogenomics study, we have determined that EVEs from several Caulimoviridae genera are found in virtually all vascular plant genomes, including ferns, gymnosperms and all grades of angiosperms, often at high copy number (Geering et al., 2014) (Diop et al., 2018). Macroevolutionary analysis of our data allowed proposing a working scenario in which Caulimoviridae would have emerged during the Devonian era, about 320 MYA.
A major conclusion stemming out from our work is that, instead of representing a single element, endogenous Caulimoviridae-related sequences often appear to define complex networks of related sequences with structural and genetic variants which remain to be characterized. For instance, some Caulimoviridae genera appear to have bipartite genomes, meaning that each viral genome is defined by two "chromosomes" that share high local identity and that are complementary to encode a functional Caulimoviridae proteome. In fact, the diversity, composition and structure of Caulimoviridae-related sequences present in plant genomes remains to be established and preliminary analysis suggests the existence of complex assortments of endogenous Caulimoviridae EVEs. As example, in the genomes of different plant species such as the oak (Quercus robur), we noticed a very high number of near-identical Caulimoviridae segments corresponding to partial viral genomes relative to the number of full length genomes. The fact that such genomic fragments can be more abundant than complete genomes probably reflects either distinct mechanisms of replication and integration or different amounts of corresponding free DNA in host nuclei. Furthermore, we have observed that full-length or near full-length Caulimoviridae EVE copies often present [TA] dinucleotide repeats at one or both extremities and that those can predate viral integration (Geering et al., 2014). This suggests that complete viral genomes may be incorporated as fillers during the repair of fragile sites in genomes. However, whether repeated fragments of viral genomes also present similar feature at integration sites remains to be addressed and this could help distinguishing integration mechanisms of complete genomes and repetitive fragments.
The goal of this project is to work towards disentangling the complex relationships between the various endogenous Caulimoviridae elements. To allow carrying out a large-scale comparative study and to provide a solution for the systematic discovery and annotation of Caulimoviridae EVEs in plant genomes, we have recently developed a tool named Caulifinder. This tool takes a genome assembly as input to perform an automated mining and semi-automated characterization of Caulimoviridae-related sequences. During this internship, the prospective student will run Caulifinder on a collection of plant genomes to begin addressing conserved and variable features of the Caulimoviridae pathosystem for different viral genera and in different groups of plants. This in-depth analysis of EVEs will allow drawing out novel hypotheses regarding the biology of Caulimoviridae and their impact on host genomes.
Candidates should be curious about evolutionary biology and have a good background in bioinformatics.
URGI at INRA Versailles is a transdisciplinary unit dedicated to genome analysis and data integration. It is composed of over 20 permanent members, including several developers and researchers. The genome analysis team is internationally recognized for his expertise in the annotation and analysis of selfish genetic elements, including transposable elements and endogenous viruses. URGI will provide a friendly and formative environment to the trainee. The INRA Versailles Centre may rent temporary on-site institutional accommodation in guest house under specific conditions.
1. Aiewsakun, P., and Katzourakis, A. (2015). Endogenous viruses: Connecting recent and ancient viral evolution. Virology 479-480, 26-37.
2. Diop, S.I., Geering, A.D.W., Alfama-Depauw, F., Loaec, M., Teycheney, P.Y., and Maumus, F. (2018). Tracheophyte genomes keep track of the deep evolution of the Caulimoviridae. Sci Rep 8, 572.
3. Geering, A.D., Maumus, F., Copetti, D., Choisne, N., Zwickl, D.J., Zytnicki, M., McTaggart, A.R., Scalabrin, S., Vezzulli, S., Wing, R.A., et al. (2014). Endogenous florendoviruses are major components of plant genomes and hallmarks of virus evolution. Nat Commun 5, 5269.
4. Pringle, C.R. (1998). The universal system of virus taxonomy of the International Committee on Virus Taxonomy (ICTV), including new proposals ratified since publication of the Sixth ICTV Report in 1995. Arch Virol 143, 203-210.