Predicting biomolecular interactions via deep learning

 CDD · Thèse  · 36 mois    Bac+5 / Master   INRIA · Villers-lès-Nancy (France)  2050 gross/month

 Date de prise de poste : 1 octobre 2023

Mots-Clés

Deep learning protein-protein interactions biomolecular interactions protein design

Description

Location: Nancy, France (partial teleworking possible)

Thesis supervisor: Hamed Khakzad (CR INRIA, LORIA)

Starting date: Fall 2023 or early 2024, depending on candidate availability

Duration: 36 months

Funding: INRIA CPJ, already acquired

Salary: 2050 gross/month in 1st-2nd year, 2158 gross/month in 3rd year.

Biological Context

Biomolecular interactions are in the heart of many biological pathways, playing key roles in different cellular processes. Many types of such interactions such as protein-protein interactions (PPIs) have been studied extensively over the past years while many others remained in large extent difficult to predict.

Novel methodological approaches based on deep learning have started to make remarkable advances in protein structure prediction and design. Most impressively, learning from large datasets of sequences and structures enabled structure prediction from sequence with remarkable accuracy. However, the performance of these methods on predicting the PPIs is limited and not generalizable to other types of biomolecules.

Project

This project will attempt to create a broader methodological framework, that can be used to capture aspects of surface conformational diversity, and interactions with different types of biomolecules including small molecules (ligands), protein-RNA, protein-DNA, and protein-protein interactions. As an ultimate goal, such information can be used further to design new biomolecules toward specific targets.

To achieve this goal, the candidate needs to have hands-on experience in developing deep learning models together with decent knowledge about the state of the art. The important tasks of this projects include: the choice of biomolecular representations (how to provide a generalizable encoding to represent biomolecules), how to use pre-trained models or transfer-learning, collecting required data for training, deciding about the suitable architecture of the deep learning model (how to merge different types of information), and testing the final model on available benchmarks.

Working context

This position will provide international collaboration (Lund University, Sweden and EPFL, Switzerland) with the possibility of in-site internships for 1-2 months.

The PhD candidate will be hosted in the CAPSID team at the LORIA lab (Inria) in Nancy (East of France). The candidate will be supervised by Hamed Khakzad (Inria Junior Professor) with expertise in integrative structural biology, protein design, and deep learning [1,2,3], and Marie-Dominique Devignes (CNRS, HDR), expert in data integration and knowledge discovery from biological databases [4]. The team consists of 7 permanent researchers with expertise in macromolecular interactions and docking, structural biology, and deep learning, together with several PhD and master students.

References

  1. S. Hauri and H. Khakzad et al., "Rapid determination of quaternary protein structures in complex biological samples," Nature Communications, vol. 10, no. 192, 2019.
  2. JK. Leman et. al., “Macromolecular modeling and design in Rosetta: recent methods and frameworks,” Nature Methods, vol. 17, no. 7, 2020
  3. C. Goverde et. al., “De novo protein design by inversion of the AlphaFold structure prediction network,” Protein Science, 2023.
  4. S. Z. Alborzi et al., "PPIDomainMiner: Inferring domain-domain interactions from multiple sources of protein-protein interactions," PLoS Computational Biology, 17 (8), 2021.

Main activities

  1. Developing deep learning models
  2. Literature review (being reported through monthly journal clubs)
  3. Implementing the method and preparing a software using Python
  4. Validating the method and analysing the results
  5. Writing dissertation, scientific articles and presenting the work in international conferences

Requirements

  • Master's degree in Computer Science, Bioinformatics, or a related master program
  • Proficiency in programming languages (Python) and good coding practices is a must
  • Experience in deep learning (scikit, PyTorch) is necessary
  • Skills in algorithm design
  • Ability to work independently and also to work in a team
  • Good oral and written English skills

Benefits package

  • Fully funded PhD position
  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Applications must be sent to Hamed.Khakzad@inria.fr

 

Candidature

Procédure : a detailed recent CV a list of publications, if any a cover letter describing the candidate’s research interest and expertise relevant to the subject the name of at least 2 scientists willing to provide a letter of recommendation, incl. Master supervisor. the Master thesis reports if available links to the Master/PhD thesis if available links to personal code repositories (s.a. github), if any

Date limite : 1 octobre 2026

Contacts

 Hamed Khakzad

 HaNOSPAMmed.Khakzad@inria.fr

Offre publiée le 2 juillet 2023, affichage jusqu'au 1 septembre 2023