Stage M2 - Dimensionality Reduction Techniques for Human Gut Microbiome data

 Stage · Stage M2  · 6 mois    Bac+5 / Master   Centre Inria de l'université de Bordeaux · Talence (France)  Gratification en vigueur

 Date de prise de poste : Jan. 12, 2026

Mots-Clés

dimensionality reduction microbiome machine learning

Description

6 months internship starting in January ofr February 2026.

The study of microbial communities, known as microbiomes, generates high-dimensional data comprised of the DNA sequences (genomes) of microorganisms present in various samples. These genomes each contain thousands of genes, which encode cellular effectors. Identifying and annotating these genes provides insights into the functions and roles of microbes, such as the molecules they consume or produce.
Among microbial ecosystems, the gut microbiome is one of the most extensively studied. Over the past two decades, the microbial DNA from thousands of samples has been sequenced, leading to the development of comprehensive databases[1,2] and a deeper understanding of the diversity and functions of microbial populations[3] . Nearly 300,000 genomes have been reconstructed and grouped into approximately 5,000 distinct microbial species. These genomes encompass several million genes, which can be clustered based on sequence similarity. This represents hundreds of times the number of genes present in our own genome. Over evolutionary history, our organisms appear to have evolved a dependence on some of these microbial genes to function, making them critical to human health.
This internship aims to investigate the distribution and co-occurrence of genes across the microbial genomes found in human gut by applying dimensionality reduction to the vast collection of genes and genomes described above. Our team’s previous work has focused on applying dimensionality reduction to the taxonomic diversity of microbiomes in human gut and in the environment[4,5]. In this project, the emphasis will be on microbial functions. The goal is to establish a base for decomposing the gut microbial gene content of individuals into “functional profiles” that will make it more interpretable and will ease downstream analyses. The core dataset to obtain these functional profiles consists of matrices showing the presence or absence of millions of genes in thousands of bacterial genomes.

Objectives of the internship include:
- Reviewing common application domains of dimensionality reduction (e.g., image processing, text mining) to identify techniques that can scale up to the size of our data.
- In particular, exploring the applicability of Non-negative Matrix Factorisation (NMF), Latent Dirichlet Allocation (LDA), and Variational Autoencoder (VAE) techniques for the dimensionality reduction of our data.
- Implementing and comparing various algorithms.
- Investigating bi-cross validation approaches for large datasets to select for the optimal number of dimensions.
- Using the obtained functional profiles to characterise human gut microbiome samples.
If you are interested in computational biology and data science, and want to apply advanced machine learning techniques to real-world biological data, we encourage you to apply!

Expected skills and profile
- Expected:
o Proficiency in Python
o Good level in English (written and spoken)
- Appreciated:
o High-performance computing
o Bash programming language
o Interest in microbiology applications
Working language: French or English

We are seeking a student with one of the following profiles:
– A Master’s degree in computer science or artificial intelligence,
– or a Master’s degree in computational biology with significant coursework in computer science.

References
1. Richardson, L. et al. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Res (2022) doi:10.1093/nar/gkac1080.
2. Gurbich, T. A. et al. MGnify Genomes: a resource for biome-specific microbial genome catalogues. J Mol Biol 168016 (2023) doi:10.1016/j.jmb.2023.168016.
3. Fan, Y. & Pedersen, O. Gut microbiota in human metabolic health and disease. Nat Rev Microbiol 1–17 (2020) doi:10.1038/s41579-020-0433-9.
4. Sommeria-Klein, G. et al. Global drivers of eukaryotic plankton biogeography in the sunlit ocean. Science 374, 594–599 (2021).
5. Frioux, C. et al. Enterosignatures define common bacterial guilds in the human gut microbiome. Cell Host Microbe (2023) doi:10.1016/j.chom.2023.05.024.

Candidature

Procédure : Candidater par email auprès de Guilhem Sommeria-Klein et Clémence Frioux Guilhem.sommeria-klein@inria.fr; clemence.frioux@inria.fr

Date limite : Dec. 31, 2025

Contacts

 Clémence Frioux
 clNOSPAMemence.frioux@inria.fr

 Guilhem Sommeria-Klein
 guNOSPAMilhem.sommeria-klein@inria.fr

 https://github.com/cfrioux/cfrioux.github.io/blob/master/data/job_offers/offre_stageM2_2026_pleiade_GSK_CF.pdf

Offre publiée le Sept. 29, 2025, affichage jusqu'au Dec. 31, 2025