Mots-Clés
Protein-carbohydrate interactions
Deep Learning
Molecular dynamics
Large-scale analysis
Description
Carbohydrates play essential biological roles as structural components, energy reservoirs, and key mediators of molecular communication at the cell surface. Their diverse architectures enable precise recognition events that regulate immunity, development, and host–pathogen interactions. As a result, protein–carbohydrate contacts influence processes ranging from tumor progression to viral and bacterial infection, making both carbohydrates and their binding proteins valuable targets for therapeutic design. However, comparing protein–carbohydrate interfaces remains challenging due to carbohydrate diversity, ligand flexibility, and experimental limitations.
In 2024 our group released a DIONYSUS database (1) gathering the carbohydrate-containing structures from the Protein Data annotated according to all the available general and carbohydrate-specific information on both proteins and ligands. Moreover, clustering of the non-covalent carbohydrate binding sites according to their 3D geometry allowed us to reveal missing functional annotations in the state-of-the-art curated databases (2).
In its current state DIONYSUS provides an integrated, user-friendly platform for exploring binding-site similarities, carbohydrate specificity, and complex quality, offering a robust foundation for comparative analysis of carbohydrate-binding site but also strong potential for the development of deep learning methods for prediction of protein-carbohydrate interactions.
The student will have an opportunity to contribute to the following axes actively explored in the framework of the further project development:
* Development of the protein-carbohydrate prediction tools using advanced deep learning techniques such as diffusion models;
* Enrichment in DIONYSUS annotations by protein and carbohydrate flexibility information using the recently released databases such as GlycoShape and GlycoShield as well as by performing molecular dynamics simulations;
* Evaluation of the performance of the state-of-the-art tools modelling protein-ligand interactions (such as AlphaFold3 and Boltz-2) on the task of modelling of protein-carbohydrate interactions.
An ideal candidate should thus be proficient in python (in particular, in scikit-learn and pytorch), have experience in machine learning model development and applications, and be fluent with basic structural bioinformatics concepts.
References:
Gheeraert A, Bailly T, Ren Y, Hamraoui A, Te J, Vander Meersche Y, Cretin G, Leon Foun Lin R, Gelly J-C, Pérez S, Guyon Frédéric & Galochkina T. DIONYSUS: a database of protein-carbohydrate interfaces. Nucleic Acids Res 53(D1), D387-D395 (2025). DOI: 10.1093/nar/gkae890
Gheeraert A, Guyon F, Pérez S, Galochkina T. Unraveling the diversity of protein-carbohydrate interfaces: insights from a multi-scale study. Carbohydr Res, 190377 (2025). DOI: 10.1016/j.carres.2025.109377
Other recent publications of the team:
Vander Meersche Y, Cretin G, Gheeraert A, Gelly J-C, Galochkina T. ATLAS: protein flexibility description from atomistic molecular dynamics simulations. Nucleic Acids Res 52(D1), D384–D392 (2024). DOI: 10.1093/nar/gkad1084
Vander Meersche Y, Diharce J, Gelly J-C, Galochkina T. Flexibility or uncertainty? A critical assessment of AlphaFold 2 pLDDT. Structure (2025). DOI: 10.1016/j.str.2025.09.001
Vander Meersche Y, Duval G, Cretin G, Gheeraert A, Gelly J-C, Galochkina T. PEGASUS: Prediction of MD-derived protein flexibility from sequence. Protein Science 34:e70221 (2025). DOI: 10.1002/pro.70221
Vander Meersche Y, Cretin G, de Brevern A G, Gelly J-C, Galochkina T. MEDUSA: Prediction of Protein Flexibility from Sequence, J Mol Biol 433(11), 166882 (2021). DOI: 10.1016/j.jmb.2021.166882