Prix SFBI de la meilleure présentation orale à Jobim 2019

Bravo à Pierre Morisse qui a remporté ce premier prix décerné par la SFBI !

Sa présentation avait pour titre : CONSENT: Scalable self-correction of long reads with multiple sequence alignment.

PM

Le parcours de Pierre :

Originaire d'une formation purement informatique, mon Master en Informatique Théorique et mon intérêt plus particulier pour l'algorithmique du texte m'ont poussé vers un stage de fin d'études axé autour de la bioinformatique. J'ai par la suite poursuivi ce stage par une thèse, au sein de l'équipe TIBS du laboratoire LITIS, à Rouen. Les
travaux réalisés dans le cadre de ma thèse traitent principalement de la correction d'erreurs des données de séquençage de troisième génération, ainsi que de l'impact de cette étape de correction sur les résultats d'analyses, et plus particulièrement sur l'assemblage.

Le résumé de sa présentation :

Third generation sequencing technologies such as Pacific Biosciences and Oxford Nanopore allow the sequencing of long reads of tens of kbs, that are expected to solve various problems, such as contig and haplotype assembly, scaffolding, and structural variant calling. However, they also reach high error rates of 10 to 30%, and thus require efficient error correction. As first long reads sequencing experiments produced reads displaying error rates higher than 15% on average, most methods relied on the complementary use of short reads data to perform correction, in a hybrid approach. However, these sequencing technologies evolve fast, and the error rate of the long reads is now capped at around 10-12%. As a result, self-correction is now frequently used as a first step of third generation sequencing data analysis projects. As of today, efficient tools allowing to perform self-correction of the long reads are available, and recent observations suggest that avoiding the use of second generation sequencing reads could bypass their inherent bias. We introduce CONSENT, a new method for the self-correction of long reads that combines different strategies from the state-of-the-art. In particular, a multiple sequence alignment strategy is combined to the use of local de Bruijn graphs. Moreover, the multiple sequence alignment benefits from an efficient segmentation strategy based on k-mers chaining, allowing to greatly reduce its time footprint. Our experiments show that CONSENT compares well to or outperforms the latest state-of-the-art self-correction methods, on real Oxford Nanopore datasets. In particular, they show that CONSENT is the only method able to scale to a human dataset containing Oxford Nanopore ultra-long reads, reaching lengths up to 340 kbp. CONSENT is freely available at https://github.com/morispi/CONSENT.