Prix SFBI des deux meilleurs posters à Jobim 2017

Bravo à Aurélien Birer et Pierre Andrieu qui ont remporté les prix poster décerné par la SFBI !


Le poster d'Aurélien:

Since the release of Oxford Nanopore Technologies (ONT) MinION sequencers in 2014 the number of reads produced with this new sequencing technology is still increasing. All the protocols remain in very active development (ONT provides updates of its chemistry and bioinformatics tools every 2-3 months). Hence the bioinformatical tools must be up to date throughout the development of MinION technology. The IBENS (Institut de Biologie de l'École normale supérieure) genomic facility is currently developing a new data analysis workflow for RNA-Seq experiments using ONT sequencing output: Toullig. This pipeline is based on Eoulsan and its bundled RNA-Seq pipeline for the Illumina reads.
The final goals of Toullig is to perform differential expression analysis from ONT long reads and produce a reference transcriptome by combining data from both Illumina and ONT technologies. In this poster, we present a RNA-Seq analysis for long reads on the Toullig step like, long read mapping or the quality control of the mapping. The new Eoulsan modules for Toullig and the toolbox for manipulating ONT data are available on GitHub.

Le parcours d'Aurélien:

Mon parcours universitaire n'a évolué qu'autour de la bioinformatique. J'ai commencé mes études avec le DUT "Génie Biologique option Bio-informatique" de l'Université d'Auvergne basé à Aurillac qui fut pour moi une révélation sur la bioinformatique. J'ai poursuivi avec une licence "Génie Biologique et Informatique" suivi du master "GENIOMHE" (GEnomics Informatics and Mathematics for Health and Environment) à l'Université d'Evry Val d'Essonne.

Le poster de Pierre:

The aim of biological data ranking is to help users faced with huge amount of data and choose between alternative pieces of information. This is particularly important when querying biological data integration systems, where even very simple queries can return thousands of answers. For instance, searching for the set of human genes involved in breast cancer returns thousands of answers in the reference database EntrezGene without any ranking in terms of importance. The need for ranking solutions, able to order answers, is crucial for helping scientists to organize their time and prioritize the new experiments to be possibly conducted. However, ranking biological data is a difficult task for various reasons: biological data are usually annotation files which reflect expertise, they thus may be associated with various degrees of confidence; the need expressed by
scientists may also be taken into consideration whether the most well-known data should be ranked first, or the freshest, etc. As a consequence, although several ranking methods have been proposed in the last years within the bioinformatics community, none of them has been deployed on systems currently in use.

The approach we propose to follow is to rank biological data by considering two steps. First, several ranking methods are applied to biological data (results are ordered using alternative ranking criteria). Second, we use consensus ranking methods reflecting the input rankings’ common points while not putting too much importance on elements classified as ”good” by only one or a few rankings. The problem, known as the median problem for a set of rankings, isNP-hard. However, since providing a consensus ranking is a crucial need for big biological data sets, designing scalable algorithms is highly challenging. Besides, the problem has been mainly studied in the case of permutations where elements are strictly ordered while in real applications some elements may be placed at the same position (considered as equally important). The challenge is then to design an algorithm computing one consensus ranking from a set of rankings with ties.

We introduce a new algorithm computing a consensus ranking from a set of rankings with ties. The originality of our approach lies in providing an efficient solution (i) based on a graph decomposition of the datasets to partition it efficiently and (ii) having several interesting and fundamental properties, which allow to evaluate the relevance of a given solution and able to provide the exact consensus in many cases. A set of experiments has been conducted on several hundreds of biological and synthetic data sets. First results appear to be very promising, making our algorithm able to compete with the best currently available algorithms while beingefficient enough to be used on real settings in particular as the algorithm used on

Le parcours de Pierre:

Peu de temps après avoir obtenu le concours de pharmacie, je me suis orienté vers une licence de mathématiques puis un master de bioinformatique. Actuellement en stage à l'université Paris-Sud sur la thématique de l'agrégation de classements appliquée aux données biologiques, je suis sur le point de commencer une thèse dans la continuité du sujet de stage.