Machine learning approaches for automatically detecting novel secretion systems

Revenir à la liste des offres d'emplois

Stage · Stage M2 · 6 mois Bac+5 / Master TIMC · Grenoble (France)

Date de prise de poste : 1 février 2023

Mots-Clés

machine learning, genomics, secretion systems, deconvolution approaches

Description

Subject

Secretion systems are crucial for bacterial organisms to interact with their environment, such as acquiring nutriments, setting up biotic defenses, as well as delivering virulence factors. There are currently 12 bacterial secretion systems known varying in size (1 to 15 proteins involved). Some organisms can have several times the same secretion system, and sometimes a single protein or group of proteins is involved in several secretion systems. Thus, detecting which protein belongs to which secretion system is a difficult task.

If we assume we know all homologs (a protein family with a common ancestor) involved in all secretion systems, can we find which homolog is involved in which secretion system? We propose to formulate this task as a deconvolution approach (e.g., non-negative matrix factorization): given a matrix X where each row corresponds to an organism, each column to a homolog, and each entry the number of times a homolog is found in an organism, factorize X into two matrices, one, denoted by V , corresponding to which homologs are found in which type of secretion system, the other, denoted by U corresponding to the number of secretion system in each organism.

This internship aims at addressing several challenges of using NMF approaches on such application: tuning the hyper-parameters of the model, nclusion of prior knowledge through penalization approaches, …

Data
We have access to a large database of bacterial and archael genomes, where each protein has been annotated as potentially being part of specific secretion systems (through sequence similarity). We also have groundtruth information on the presence / absence of secretion systems, thus making this an ideal dataset to develop and test deconvolution approaches for real-world problems of major importance.

Applications

We are looking for a 1st or 2nd year master student or equivalent for a 4 to 6 months internship.

Candidature

Procédure : Application (CV + cover letter) are to be addressed at nelle.varoquaux@univ-grenoble-alpes.fr

Date limite : 15 janvier 2023

Contacts

Nelle Varoquaux

neNOSPAMlle.varoquaux@univ-grenoble-alpes.fr

https://tree-timc.github.io/compbio/files/2022_M2_homologs_SS.pdf

Offre publiée le 14 novembre 2022, affichage jusqu'au 13 janvier 2023