M2 internship: Transcriptomic Analysis using Intensive Randomization

 Stage · Stage M2  · 6 mois    Bac+5 / Master   UR 7537 BioSTM, Université de Paris · Paris (France)  Approximately 580€/month

 Date de prise de poste : 1 février 2022

Mots-Clés

transcriptomic RNA-seq differential analysis graphic processing unit (GPU) data science computer science

Description

Project 

Next-generation sequencing such as RNA-seq aims to quantify the transcription of biological samples and compare gene expression between different experimental conditions. The quantification of the genome alignements stemming from such technologies represents the relative measurements which cannot be directly compared between conditions without an adequate data normalization. The optimal approach to normalize such data has not reached a consensus to date (Abrams et al. 2019). Unfortunately, existing methods suffer from practical limitations and may be compromised by the presence of genes showing high expression level or strong variability. In this case a single normalization procedure can lead to erroneous results and false conclusions. Therefore, a novel statistical framework for differential analysis has been proposed (Desaulle et al. 2021) which is based on intensive iterative random data normalizations and provides good control of the statistical errors. At present, it has been implemented in the R package DArand (Desaulle and Rozenholc 2021) and is publicly available from the Comprehensive R Archive Network. The current package is written in R language and uses only CPU parallelization. Due to the large data size and the framework based on intensive iterative randomizations, further project development requires more advance programming. More precisely, the iterative procedure uses intensive computations and may become rapidly time-consuming with respect to both the size of the transcripts-mic experiment and the number of samples. Therefore, the main mission during the internship will consist in adapting the code for efficient parallel processing on a graphic processing unit (GPU) using CUDA. The computational optimization will play an important role in further methodological development. Indeed, the subsequent contribution will aim at extending the methodology from two to more biological conditions. It will be directed towards statistical analysis with more than two conditions such as differential analysis, principal component analysis (PCA) and more generally unsupervised learning tools. Here the difficulty will be to preserve an iterative structure of the procedure with data normalization and while combining results from different approaches in data analysis. The methodological aspect, the implementation and the validation will be followed by the real-data application involving the miRNA data.


Master-level intern

The successful candidate should hold a master degree in data science or computer science with knowledge related to statistics, machine learning or AI and is also expected to interact with the researchers of the interdisciplinary teams throughout  the internship. 

Moreover any of the following skills will be considered as an advantage

- good programming skills including GPU computing

- strong interest for biology

- advance level in English


Teams

The internship will take place in the UR 7537 BioSTP - " Biostatistique, Traitement et Modélisation des données biologiques" at the Faculté de Pharmacie, Université de Paris. 

The development of this project is a part of the collaboration with UTCBS CNRS UMR 8258 - INSERM U 1267 research unit on identification of miRNAs involved in NASH disease. 

The project will benefit from the collaboration with researchers in computer science from the Data Intelligence Institute of Paris. 


References

Abrams, Zachary B., Travis S. Johnson, Kun Huang, Philip R. O. Payne, and Kevin Coombes. 2019. “A Protocol to Evaluate RNA Sequencing Normalization Methods.” BMC Bioinformatics 20 (24): 679. https://doi.org/10.1186/s12859-019-3247-x.

Desaulle, Dorota, Céline Hoffmann, Bernard Hainque, and Yves Rozenholc. 2021. “Differential Analysis in Transcriptomic: The Strength of Randomly Picking ’Reference’ Genes.” http://arxiv.org/abs/2103.09872.

Desaulle, Dorota, and Yves Rozenholc. 2021. DArand: Differential Analysis with Random Reference Genes. https://CRAN.R-project.org/package=DArand.







Candidature

Procédure : Please contact: dorota.desaulle@u-paris.fr

Date limite : 31 janvier 2022

Contacts

 Dorota Desaulle, PhD, MCU

 doNOSPAMrota.desaulle@u-paris

Offre publiée le 22 novembre 2021, affichage jusqu'au 15 janvier 2022