Construction of an exhaustive database of three-residue fragments extracted from experimental protein structures

Type de poste
Durée du poste
Contrat renouvelable
Contrat non renouvelable
Date de prise de fonction
Date de fin de validité de l'annonce
Nom de la structure d'accueil

<p>7, Av. du Colonel Roche, 31400 Toulouse</p>

Email du/des contacts


Information extracted from experimentally determined protein structures is frequently used in computational biology. Statistics about the most frequent values of the backbone dihedral angles in a single amino acid residue are frequently used for conformational sampling of highly-flexible proteins or regions [1]. However, such minimalistic single-residue fragments neglect important structural preferences depending on the amino acid sequence. Structural libraries involving larger fragments (usually, from 3 to 14 residues) have been shown to be powerful tools for the prediction of probable (stable) conformations of globular proteins and peptides [2]. Since these fragment libraries were mainly conceived for protein structure prediction, they are focused on the most probable conformations of small and medium-sized fragments; they are not sufficiently exhaustive for a wide sampling of the conformational space. Fragments involving three consecutive amino acid residues (called tripeptides from now on) represent a good trade-off between sequence-dependent structural preferences and exhaustiveness. Indeed, tripeptides contain relevant structural information [3], but are small-enough to capture the conformational variability of the 20 proteinogenic amino acids. Recently, we showed that an extensive database of tripeptides allows to accurately sample the conformational variability of protein loops [4] and intrinsically disordered proteins [5].

The goal of this project is the construction of an improved version of our tripeptide database, considering all the structures deposited in the Protein Data Bank (PDB, In addition to the extraction and organization of the information, a filtering stage will be implemented to avoid redundancies. This will involve sequence alignments and clustering. The new database will allow us to enhance the performance of the methods developed in our laboratory. Furthermore, a well-constructed and easily accessible database would be very useful in the computational structural biology community. Therefore, we aim to publish the results by the end of the project.


[1] Smith, L.J., Bolin, K.A., Schwalbe, H., MacArthur, M.W., Thornton, J.M., Dobson, C.M. Analysis of main chain torsion angles in proteins: Prediction of NMR coupling constants for native and random coil conformations. Journal of Molecular Biology 255(3), 494-506 (1996).

[2] Rohl, C.A., Strauss, C.E., Misura, K.M., Baker, D. Protein structure prediction using Rosetta. In: Numerical Computer Methods, Part D, Methods in Enzymology, vol. 383, pp. 66-93. Academic Press. 2004.

[3] Huang, J.R., Ozenne, V., Jensen, M.R., Blackledge, M. Direct prediction of NMR residual dipolar couplings from the primary sequence of unfolded proteins. Angewandte Chemie-International Edition 52(2), 687-690 (2013)

[4] Barozet, A., Molloy, K., Vaisset, M., Simeon, T., Cortes, J. A reinforcement learning approach to enhance protein loop sampling. (In preparation)

[5] Estana, A., Sibille, N., Delaforge, E., Vaisset, M., Cortes, J., Bernado, P. Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database. Structure, in press (2018).

Expected skills:

The candidate should have: good programming skills (mainly C++), familiarity with Linux, and knowledge about databases. Some background in structural biology would be an important plus. Teamwork skills are also essential for the achievement of the project.

Possibility of funding:

The student will be provided with a monthly stipend of around 550 euros during up to six months.


Please send an email containing your CV to Juan Cortés (, indicating in the subject “Candidate tripeptide database project”.

p { margin-bottom: 0.1in; line-height: 120%; }a:link { }