Postdoctoral position: mining large RNA sequence databases for medical biomarkers and targets

 CDD · Postdoc  · 24 mois    Bac+8 / Doctorat, Grandes Écoles   Institute for Integrative Biology of the Cell (I2BC) · Gif sur Yvette (France)  2400-2600€ /mois net

 Date de prise de poste : 1 octobre 2022


Transcriptomics RNA-seq NGS classifiers biomarkers cancer aging


High-throughput RNA sequencing (RNA-seq) is a unique tool for the discovery of medical biomarkers and drug targets. However, while over one million human RNA-seq libraries are publicly available, this treasure trove of medical information cannot realize its full potential because it is impossible to directly query this resource to measure the expression of an RNA of interest. Several bioinformatics projects have addressed this issue, but they rely on normal reference RNAs that do not capture the full diversity of transcripts found in disease. New reference-free data structures using k-mers could allow querying of these large sequence databases. However, several improvements are needed to make them true data mining tools for discovering RNAs associated with human diseases.

In the framework of a newly funded project, we are developing indexing structures capable of handling quantitative queries without reference in tens of thousands of RNA-seq libraries while optimizing disk and memory consumption [1]. Our team is interested in exploiting these large databases for the discovery of novel RNAs significantly associated with qualitative or quantitative traits related to the phenotype of samples [2,3]. These RNAs can be biomarkers, therapeutic or vaccine targets or predictive signatures. We are particularly targeting applications in the field of oncology and aging/senescence. Our consortium is composed of bioinformaticians from four institutions, with strong experience in informatics, data structure, high-throughput RNA-seq analysis and health transcriptomics.

The postdoctoral researcher will participate in the development of biostatistical tools to extract sequences from the index associated with characteristics of biological interest (age/senescence, pathology, cell type), to generate predictive models from these variables, and to test these models. This will involve the development of model aggregation procedures adapted to the size and heterogeneity of the tables analyzed. The activity will be co-supervised by a biostatistician from I2BC.


We are seeking candidates with a PhD in bioinformatics or computational biology and experience in the field of human genomics or transcriptomics. Candidates must be motivated firsthand by biological discovery and will therefore need a good background in some of the biological facets of the project: transcription, RNA maturation, somatic and germline mutations, epigenome, cancer, aging. The position also requires an understanding of common machine learning procedures used in transcriptomics (dimension reduction, variable selection, regression methods, cross validation) and a familiarity with good practices in bioinformatics code development (version management, workflow managers, containers). Knowledge of massive database indexing techniques (k-mers, DBG, Bloom filters) would be an additional asset, but can be learned through immersion in the project team.


The host team, specialized in RNA bioinformatics, is composed of 5 researchers and permanent research professors. The postdoctoral fellow will be integrated in a consortium of computer science and bioinformatics laboratories. He/she will participate in the consortium meetings and will benefit from our collaborations within this group.


  1. Marchet, C., Iqbal, Z., Gautheret, D., Salson, M. & Chikhi, R. REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets. Bioinformatics. 36, i177–i185 (2020).
  2. Wang Y, Xue H, Aglave M, Lainé A, Gallopin M, Gautheret D. (2022) The contribution of uncharted RNA sequences to tumor identity in lung adenocarcinoma. NAR Cancer. 4:1. 
  3. Nguyen Ha TN, Xue H, Firlej V, Ponty Y, Gallopin M, Gautheret D. (2021) Reference-Free Transcriptome Signatures for Prostate Cancer Prognosis. BMC Cancer. 12:394.


Procédure : Send CV and cover letter by email

Date limite : 15 septembre 2022


Daniel Gautheret

Offre publiée le 8 août 2022, affichage jusqu'au 15 septembre 2022