Stage sur les transferts horizontaux chez les eucaryotes
Stage · Stage M2 · 6 mois Bac+5 / Master URGI · Versailles (France)
Date de prise de poste : 1 février 2023
Evolution Transfert de gènes Pipeline
Cross-kingdom horizontal gene transfers in eukaryotes
Horizontal gene transfer (HGT) is the passage of genetic material between organisms by means other than reproduction. The patterns, mechanisms, and vectors of HGT are well-characterized in prokaryotes, in which these transfers are ubiquitous and a major source of innovation (Soucy et al., 2015). In eukaryotes, HGT has long been considered anecdotal because of multiple barriers that should impede such transfers, or controversial, as resulting from phylogenetic artifacts or contaminant sequences. Yet, dozens of recent studies making use of an increasing number of high-quality genome and transcriptome sequences have reported robust HGT in various eukaryotic organisms (Leger et al., 2018). Many of these HGT cases played a role in the adaptation of the recipient lineage to a new environmental niche, sometimes underpinning major evolutionary transitions (Van Etten and Bhattacharya, 2020). HGT in eukaryotes can be of various origins spanning cellular organisms (prokaryotes and eukaryotes) and viruses but they are facilitated by chronic interactions between species. While it emerges as a boiling field of research in evolutionary biology, the extent to which HGT shaped evolution and adaptation, compared to other sources of variation, remains largely unexplored.
The rapidly growing number of publicly available genomic sequences from all branches of life allows improving the resolution of HGT inference and analyzing its pervasiveness across whole taxa and ecosystems. This influx of genomic data is accompanied by the rapid growth of protein databases used as targets, thereby augmenting the number of sequence comparisons to be performed. Recent programs allow keeping up with this trend and make it possible to address the occurrence of HGT at an unprecedented scale. For instance, a recent study has addressed the occurrence of HGT across over 200 insect proteomes representing about 3 million proteins. They found HGT to be widespread in insects and functional analysis of one of these genes revealed it is essential in controlling the courtship behavior in lepidopterans (Li et al., 2022).
The most recent studies have designed automated bioinformatics workflows that have several of their main steps in common, including the identification of candidate genes, the collection of proteins in target databases to test the HGT hypothesis, protein sequence alignments, and the construction of phylogenetic trees. The analysis of these trees allows confirming or not each potential HGT event. The main theoretical variation among workflows is the first step of this process: the analysis performed to identify candidate proteins among whole proteomes, which largely determines the sensitivity and efficiency of the steps forward.
We have recently established a workflow that determines initial candidate proteins using fast approximate prediction of their taxonomic origin following an approach inspired by taxonomic binning of metagenomics contigs. By using cutting-edge and scalable programs for protein similarity search and phylogenetic analysis, our workflow enables the analysis of a large amount of data. We have recently demonstrated the value of our approach by reporting, for the first time, the widespread transfer of genes of plant origin towards the genome of the whitefly (Bemisia tabaci) (Gilbert and Maumus, 2022).
While different scalable HGT detection workflows have been designed for internal usage, none is available as a distributed tool. Our HGT detection workflow currently presents in the form of a series of bash commands. Your goal will be to build on this template to produce a turnkey, easy-to-use, and efficient HGT detection pipeline to annotate the genes of exogenous origin in any eukaryotic genome. The main objectives will be as follows:
- Complement the workflow with the ETE toolkit (http://etetoolkit.org/ ) to automate the tests in tree topology
- Wrap the HGT detection workflow into a Snakemake pipeline
- Produce benchmarks and fix the main causes of false positives and false negatives
- Package the pipeline and its running environment in a Docker image for distribution
This tool will be very useful to the scientific community to better understand the breadth and evolutionary significance of gene flow in eukaryotes. Our research focuses on cross-kingdom transfers towards eukaryotes and their impact on host-parasite relationships, especially in the phytobiome ecosystem. While this project is geared towards development tasks, it shall be continued by the large-scale application of the workflow to address these fundamental questions in the context of a thesis project that will be proposed in 2023.
URGI is located at the INRAE Centre in Versailles. It is a transdisciplinary unit dedicated to genome analysis and data integration. It is composed of about 15 permanent members, including several developers, engineers, and researchers. Some affordable guestrooms in the INRAE Centre and at a walking distance may be available if needed.
The prospective student should be M2 or equivalent in bioinformatics. He/She should have a strong interest in evolutionary biology and should be proactive. The working language will be French or English.
Li, Y., Liu, Z., Liu, C., Shi, Z., Pang, L., Chen, C., Chen, Y., Pan, R., Zhou, W., Chen, X.X., et al. (2022). HGT is widespread in insects and contributes to male courtship in lepidopterans. Cell 185, 2975-2987 e2910.
Procédure : Envoyer un mail à firstname.lastname@example.org
Date limite : 31 décembre 2022
Offre publiée le 12 septembre 2022, affichage jusqu'au 31 décembre 2022