M2 internship in bioinformatics

 Stage · Stage M2  · 6 mois    Bac+5 / Master   center for computational biology · Paris (France)


Horizontal gene transfer genomic sequence analysis metagenomic


Identifying the Determinants of Genetic Exchange between Bacteria

    The ability of bacteria belonging to different species and different environments to exchange genetic material via Horizontal Gene Transfer (HGT) is a key mechanism of microorganisms evolution and constitutes an important source of genetic novelty for microbial species. On short timescales it is the primary reason for the spread of antibiotic resistance in bacteria [1] and plays an important role in the evolution of virulence [2]. As such, it is of crucial importance for human health, but remains poorly studied. In particular, because identifying transfer events between distant species is computationally very costly, only a small number of those transfer events have been documented. Due to this partial knowledge, the biological conditions that make those exchange possible, and influence the rates at which species exchange material are not well understood.

     We recently developed a method to identify HGT in bacteria [3]. The principle of our method is that long sequences of DNA (>300bp) exactly identical between pairs of bacteria of different species are extremely unlikely to occur via classical vertical inheritance. Long exact matches can be discovered very efficiently, allowing to apply this method to extremely large datasets. Moreover, we found that the length distribution of those exact matches follow a power law distribution. Fitting this power law allowed us to estimate the transfer rate between all pairs of genera for which the genome of enough species have been sequenced. However our method suffers from several drawbacks. In particular, our method is only efficient at detecting recent transfer events, only identifies a small subpart of the full transferred sequence, and is unable to distinguish the donor and the receiver species. Finally, the statistical method we used to estimate the transfer rate strongly relies on the good fit of the match length distribution to a power law, leading to poor estimation when the distribution deviates from the theoretical expectation.

    The goal of the internship will be to develop bioinformatic and statistical method to overcome those issues. First, using the discovered long exact matches as anchor seed, we will be able to use classical alignment tools to identify the entire transferred sequence, in order to identify all the genes that take part in the transfer, and to more accurately date the age of the event. Using the full transferred sequence, the student will further develop a method to assess the direction of the transfer using phylogenetic methods. Next, the intern will work on developing a reliable method to estimate the HGT rates, using theoretical arguments and validating the results using simulated data. Finally, all those development will be further used to study genetic exchanges between species living in different environment using metagenomic datasets.

Preferred Experience/Educational Background:

• Master degree second year (or equivalent Engineering degree) in Bioinformatics or Applied Statistics.

• Experience in programming (R or Python)

• Ideally, some experience at using Unix / high-performance compute cluster


[1] Huddleston, J.R., Infection and drug resistance (2014) https://doi.org/10.2147/IDR.S48820 

[2] Gyles, C., et al. Veterinary Pathology (2013). https://doi.org/10.1177/0300985813511131

[3] Sheinman, M., et al. eLife (2021). https://doi.org/10.7554/eLife.62719 


Procédure : Interested candidates should send a detailed CV and a cover letter to Florian Massip (florian.massip@mines-paristech.fr)

Date limite : None


Florian Massip


Offre publiée le 1 octobre 2021, affichage jusqu'au 29 novembre 2021